Physicists at CERN conduct extensive experiments using particle accelerators. A variety of process visualization and control systems is used to monitor the equipment, including the TIM system and Diagnostics and Monitoring (DIAMON). TIM monitors around 120,000 sensors in such areas as building services engineering, while DIAMON monitors some of the equipment and IT components related to the particle accelerators.
Technicians and engineers at the control stations have dashboards that show the status of the systems; the dashboards are continuously updated based on the data from the monitoring systems. It is, therefore, of the utmost importance to CERN to operate the monitoring platforms in the most fail-safe way possible.
“BigMemory helps us put high availability into effect for some of the comprehensive systems that monitor the research facilities and building technology at CERN, giving engineers and scientists up-to-date data about the status of the various technical systems—in real time and without interruption.”
— Matthias Bräger | Project Manager, Technical Monitoring Software, CERN
To achieve the necessary high level of availability and robustness, the CERN IT team developed the CERN Control and Monitoring Platform (C2MON), a redundant cluster solution for operating any monitoring platform. TIM and DIAMON are the first two systems to be implemented on the new high-availability solution. One of the central components of C2MON is Terracotta BigMemory, Software AG’s in-memory data fabric solution. Terracotta BigMemory provides a high-performance in-memory store that is shared by the cluster solution’s servers and reliably handles even the highest traffic.
Terracotta provides distributed in-memory storage for the server systems in the cluster solution, which has been active since the beginning of 2012. Based on the C2MON architecture, the TIM and DIAMON platforms operated throughout 2012 without any significant downtime. The high performance of Terracottaâ€™s in-memory solution also accelerated the overall systemâ€™s response times. In addition, CERNâ€™s IT team is now able to add patches and new products to C2MON without interrupting central monitoring services.
The TIM and DIAMON systems are part of a comprehensive monitoring infrastructure at CERN. Development work on this solution started in the 1990s, and a monitoring system used by many departments has evolved over time. TIM assures technicians and engineers at their control stations are always kept informed about the current status of their systems.
In accordance with internal requirements, the TIM platform has been continuously expanded to provide more and more departments with data on the current operating status of complex systems related to the particle accelerators. For example, electrical systems, levels and temperatures, airlocks and Supervisory Control and Data Acquisition (SCADA) systems, ventilation and sections of the access control systems are monitored. Because of the high level of interdependence, the requirements for system availability and robustness kept increasing—to the point where the platforms have achieved a system-critical status and they can no longer be switched off for maintenance purposes.
Achieving top availability
CERN’s IT experts had to find a solution that allows fail-safe, high-performance operation of the monitoring platform. Because these systems play the role of a control center, unexpected occurrences, power outages and major accidents result in a large number of events. One of the requirements for the implementation was that the incoming data could be processed in less than one second. Even during the greatest peaks, the infrastructure must be stable enough to handle the large number of status messages and help the staff in the control center and engineers troubleshooting the systems.
The IT team at CERN developed C2MON as a redundant system with two or more servers that access a common database. The shared data is made available via BigMemory. To provide the highest possible availability for the C2MON platform, Terracotta supports an automatic failover between mirrored servers to prevent data loss and disruption of applications.
In addition, the Terracotta solution is designed for mission-critical applications that have high throughput, availability and scalability requirements. Terracotta BigMemory stores large amounts of data in the main memory to accelerate data access considerably. This reduces overall system response times significantly while increasing throughput, overcoming, for example, the performance and scalability limits of databases.
In the first year, C2MON achieved availability of 99.99984 percent for the TIM platform. To conduct necessary development and maintenance work, the platform was restarted several times without interrupting the monitoring process. Further expansion of the solution is possible. For example, at present Terracotta BigMemory stores just the current status data of around 120,000 measuring points in the memory, while historical analysis is carried out on the connected database. To facilitate faster analysis, the system could be expanded to store historical data in Terracotta BigMemory to perform analysis in real-time. Further information on C2MON can be found at: http://cern.ch/c2mon.