Author: Schwander, T.
Paper Title Page
TH2AO02 High Availability Alarm System Deployed with Kubernetes 1134
 
  • J.J. Bellister, T. Schwander, T. Summers
    SLAC, Menlo Park, California, USA
 
  To support multiple scientific facilities at SLAC, a modern alarm system designed for availability, integrability, and extensibility is required. The new alarm system deployed at SLAC fulfills these requirements by blending the Phoebus alarm server with existing open-source technologies for deployment, management, and visualization. To deliver a high-availability deployment, Kubernetes was chosen for orchestration of the system. By deploying all parts of the system as containers with Kubernetes, each component becomes robust to failures, self-healing, and readily recoverable. Well-supported Kubernetes Operators were selected to manage Kafka and Elasticsearch in accordance with current best practices, using high-level declarative deployment files to shift deployment details into the software itself and facilitate nearly seamless future upgrades. An automated process based on git-sync allows for automated restarts of the alarm server when configuration files change eliminating the need for sysadmin intervention. To encourage increased accelerator operator engagement, multiple interfaces are provided for interacting with alarms. Grafana dashboards offer a user-friendly way to build displays with minimal code, while a custom Python client allows for direct consumption from the Kafka message queue and access to any information logged by the system.  
slides icon Slides TH2AO02 [0.798 MB]  
DOI • reference for this paper ※ doi:10.18429/JACoW-ICALEPCS2023-TH2AO02  
About • Received ※ 06 October 2023 — Revised ※ 09 October 2023 — Accepted ※ 14 December 2023 — Issued ※ 18 December 2023
Cite • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)