JACoW is a publisher in Geneva, Switzerland that publishes the proceedings of accelerator conferences held around the world by an international collaboration of editors.
TY - CONF AU - Bellister, J.J. AU - Schwander, T. AU - Summers, T. ED - Schaa, Volker RW ED - Götz, Andy ED - Venter, Johan ED - White, Karen ED - Robichon, Marie ED - Rowland, Vivienne TI - High Availability Alarm System Deployed with Kubernetes J2 - Proc. of ICALEPCS2023, Cape Town, South Africa, 09-13 October 2023 CY - Cape Town, South Africa T2 - International Conference on Accelerator and Large Experimental Physics Control Systems T3 - 19 LA - english AB - To support multiple scientific facilities at SLAC, a modern alarm system designed for availability, integrability, and extensibility is required. The new alarm system deployed at SLAC fulfills these requirements by blending the Phoebus alarm server with existing open-source technologies for deployment, management, and visualization. To deliver a high-availability deployment, Kubernetes was chosen for orchestration of the system. By deploying all parts of the system as containers with Kubernetes, each component becomes robust to failures, self-healing, and readily recoverable. Well-supported Kubernetes Operators were selected to manage Kafka and Elasticsearch in accordance with current best practices, using high-level declarative deployment files to shift deployment details into the software itself and facilitate nearly seamless future upgrades. An automated process based on git-sync allows for automated restarts of the alarm server when configuration files change eliminating the need for sysadmin intervention. To encourage increased accelerator operator engagement, multiple interfaces are provided for interacting with alarms. Grafana dashboards offer a user-friendly way to build displays with minimal code, while a custom Python client allows for direct consumption from the Kafka message queue and access to any information logged by the system. PB - JACoW Publishing CP - Geneva, Switzerland SP - 1134 EP - 1137 KW - monitoring KW - status KW - interface KW - feedback KW - site DA - 2024/02 PY - 2024 SN - 2226-0358 SN - 978-3-95450-238-7 DO - doi:10.18429/JACoW-ICALEPCS2023-TH2AO02 UR - https://jacow.org/icalepcs2023/papers/th2ao02.pdf ER -