JACoW logo

Journals of Accelerator Conferences Website (JACoW)

JACoW is a publisher in Geneva, Switzerland that publishes the proceedings of accelerator conferences held around the world by an international collaboration of editors.


RIS citation export for TH2AO02: High Availability Alarm System Deployed with Kubernetes

TY  - CONF
AU  - Bellister, J.J.
AU  - Schwander, T.
AU  - Summers, T.
ED  - Schaa, Volker RW
ED  - Götz, Andy
ED  - Venter, Johan
ED  - White, Karen
ED  - Robichon, Marie
ED  - Rowland, Vivienne
TI  - High Availability Alarm System Deployed with Kubernetes
J2  - Proc. of ICALEPCS2023, Cape Town, South Africa, 09-13 October 2023
CY  - Cape Town, South Africa
T2  - International Conference on Accelerator and Large Experimental Physics Control Systems
T3  - 19
LA  - english
AB  - To support multiple scientific facilities at SLAC, a modern alarm system designed for availability, integrability, and extensibility is required. The new alarm system deployed at SLAC fulfills these requirements by blending the Phoebus alarm server with existing open-source technologies for deployment, management, and visualization. To deliver a high-availability deployment, Kubernetes was chosen for orchestration of the system. By deploying all parts of the system as containers with Kubernetes, each component becomes robust to failures, self-healing, and readily recoverable. Well-supported Kubernetes Operators were selected to manage Kafka and Elasticsearch in accordance with current best practices, using high-level declarative deployment files to shift deployment details into the software itself and facilitate nearly seamless future upgrades. An automated process based on git-sync allows for automated restarts of the alarm server when configuration files change eliminating the need for sysadmin intervention. To encourage increased accelerator operator engagement, multiple interfaces are provided for interacting with alarms. Grafana dashboards offer a user-friendly way to build displays with minimal code, while a custom Python client allows for direct consumption from the Kafka message queue and access to any information logged by the system. 
PB  - JACoW Publishing
CP  - Geneva, Switzerland
SP  - 1134
EP  - 1137
KW  - monitoring
KW  - status
KW  - interface
KW  - feedback
KW  - site
DA  - 2024/02
PY  - 2024
SN  - 2226-0358
SN  - 978-3-95450-238-7
DO  - doi:10.18429/JACoW-ICALEPCS2023-TH2AO02
UR  - https://jacow.org/icalepcs2023/papers/th2ao02.pdf
ER  -