JACoW logo

Journals of Accelerator Conferences Website (JACoW)

JACoW is a publisher in Geneva, Switzerland that publishes the proceedings of accelerator conferences held around the world by an international collaboration of editors.


BiBTeX citation export for TH2AO02: High Availability Alarm System Deployed with Kubernetes

@inproceedings{bellister:icalepcs2023-th2ao02,
  author       = {J.J. Bellister and T. Schwander and T. Summers},
  title        = {{High Availability Alarm System Deployed with Kubernetes}},
% booktitle    = {Proc. ICALEPCS'23},
  booktitle    = {Proc. 19th Int. Conf. Accel. Large Exp. Phys. Control Syst. (ICALEPCS'23)},
  eventdate    = {2023-10-09/2023-10-13},
  pages        = {1134--1137},
  paper        = {TH2AO02},
  language     = {english},
  keywords     = {monitoring, status, interface, feedback, site},
  venue        = {Cape Town, South Africa},
  series       = {International Conference on Accelerator and Large Experimental Physics Control Systems},
  number       = {19},
  publisher    = {JACoW Publishing, Geneva, Switzerland},
  month        = {02},
  year         = {2024},
  issn         = {2226-0358},
  isbn         = {978-3-95450-238-7},
  doi          = {10.18429/JACoW-ICALEPCS2023-TH2AO02},
  url          = {https://jacow.org/icalepcs2023/papers/th2ao02.pdf},
  abstract     = {{To support multiple scientific facilities at SLAC, a modern alarm system designed for availability, integrability, and extensibility is required. The new alarm system deployed at SLAC fulfills these requirements by blending the Phoebus alarm server with existing open-source technologies for deployment, management, and visualization. To deliver a high-availability deployment, Kubernetes was chosen for orchestration of the system. By deploying all parts of the system as containers with Kubernetes, each component becomes robust to failures, self-healing, and readily recoverable. Well-supported Kubernetes Operators were selected to manage Kafka and Elasticsearch in accordance with current best practices, using high-level declarative deployment files to shift deployment details into the software itself and facilitate nearly seamless future upgrades. An automated process based on git-sync allows for automated restarts of the alarm server when configuration files change eliminating the need for sysadmin intervention. To encourage increased accelerator operator engagement, multiple interfaces are provided for interacting with alarms. Grafana dashboards offer a user-friendly way to build displays with minimal code, while a custom Python client allows for direct consumption from the Kafka message queue and access to any information logged by the system. }},
}