Axo: Detection and Recovery for Delay and ...
Document type :
Compte-rendu et recension critique d'ouvrage
DOI :
Title :
Axo: Detection and Recovery for Delay and Crash Faults in Real-Time Control Systems
Author(s) :
Mohiuddin, Maaz [Auteur]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Saab, Wajeb [Auteur]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Bliudze, Simon [Auteur]
Self-adaptation for distributed services and large software systems [SPIRALS]
Le Boudec, Jean-Yves [Auteur]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Saab, Wajeb [Auteur]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Bliudze, Simon [Auteur]
Self-adaptation for distributed services and large software systems [SPIRALS]
Le Boudec, Jean-Yves [Auteur]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Journal title :
IEEE Transactions on Industrial Informatics
Pages :
3065 - 3075
Publisher :
Institute of Electrical and Electronics Engineers
Publication date :
2018-07
ISSN :
1551-3203
English keyword(s) :
Index Terms-Reliability
delay faults
fault detection
fault recovery
real-time
delay faults
fault detection
fault recovery
real-time
HAL domain(s) :
Informatique [cs]/Systèmes embarqués
Informatique [cs]/Système multi-agents [cs.MA]
Informatique [cs]/Modélisation et simulation
Informatique [cs]/Langage de programmation [cs.PL]
Informatique [cs]/Génie logiciel [cs.SE]
Informatique [cs]/Système multi-agents [cs.MA]
Informatique [cs]/Modélisation et simulation
Informatique [cs]/Langage de programmation [cs.PL]
Informatique [cs]/Génie logiciel [cs.SE]
English abstract : [en]
Real-time control systems use controllers that compute and issue setpoints within stringent delay constraints. Failure to do so, due to a crash or delay as a result of software and/or hardware faults, can cause failure of ...
Show more >Real-time control systems use controllers that compute and issue setpoints within stringent delay constraints. Failure to do so, due to a crash or delay as a result of software and/or hardware faults, can cause failure of the controlled resources. Recently, Axo, a protocol for masking crash and delay faults by replicating the controller, was proposed. Axo provides safety by discarding delayed setpoints, and it relies on the presence of valid setpoints for providing availability. To ensure that enough valid setpoints are issued, faulty controller replicas need to be detected and recovered. We present a mechanism for detection and recovery of delay- and crash-faulty replicas under the Axo framework. These mechanisms were designed to be soft state (i.e., their state can be reconstructed from received messages) to enable seamless additions of new replicas. Besides presenting the design, we analytically characterize the time to detect and recover a faulty replica, and we validate them experimentally. We demonstrate the performance of Axo by using two case studies: the first provides a stability analysis of an inverted pendulum system with Axo, and the second shows the fault-tolerance performance of Axo through a deployment on a real-time control system that controls a CIGRÉ low-voltage benchmark microgrid.Show less >
Show more >Real-time control systems use controllers that compute and issue setpoints within stringent delay constraints. Failure to do so, due to a crash or delay as a result of software and/or hardware faults, can cause failure of the controlled resources. Recently, Axo, a protocol for masking crash and delay faults by replicating the controller, was proposed. Axo provides safety by discarding delayed setpoints, and it relies on the presence of valid setpoints for providing availability. To ensure that enough valid setpoints are issued, faulty controller replicas need to be detected and recovered. We present a mechanism for detection and recovery of delay- and crash-faulty replicas under the Axo framework. These mechanisms were designed to be soft state (i.e., their state can be reconstructed from received messages) to enable seamless additions of new replicas. Besides presenting the design, we analytically characterize the time to detect and recover a faulty replica, and we validate them experimentally. We demonstrate the performance of Axo by using two case studies: the first provides a stability analysis of an inverted pendulum system with Axo, and the second shows the fault-tolerance performance of Axo through a deployment on a real-time control system that controls a CIGRÉ low-voltage benchmark microgrid.Show less >
Language :
Anglais
Popular science :
Non
Collections :
Source :
Files
- https://hal.archives-ouvertes.fr/hal-01846124/document
- Open access
- Access the document
- https://doi.org/10.1109/tii.2017.2772219
- Open access
- Access the document
- https://doi.org/10.1109/tii.2017.2772219
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-01846124/document
- Open access
- Access the document
- https://doi.org/10.1109/tii.2017.2772219
- Open access
- Access the document
- https://doi.org/10.1109/tii.2017.2772219
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-01846124/document
- Open access
- Access the document
- https://doi.org/10.1109/tii.2017.2772219
- Open access
- Access the document
- https://doi.org/10.1109/tii.2017.2772219
- Open access
- Access the document
- document
- Open access
- Access the document
- Axo_TII_preprint.pdf
- Open access
- Access the document
- tii.2017.2772219
- Open access
- Access the document
- document
- Open access
- Access the document
- Axo_TII_preprint.pdf
- Open access
- Access the document