Axo: Detection and Recovery for Delay and ...
Type de document :
Compte-rendu et recension critique d'ouvrage
DOI :
Titre :
Axo: Detection and Recovery for Delay and Crash Faults in Real-Time Control Systems
Auteur(s) :
Mohiuddin, Maaz [Auteur]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Saab, Wajeb [Auteur]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Bliudze, Simon [Auteur]
Self-adaptation for distributed services and large software systems [SPIRALS]
Le Boudec, Jean-Yves [Auteur]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Saab, Wajeb [Auteur]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Bliudze, Simon [Auteur]
Self-adaptation for distributed services and large software systems [SPIRALS]
Le Boudec, Jean-Yves [Auteur]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Titre de la revue :
IEEE Transactions on Industrial Informatics
Pagination :
3065 - 3075
Éditeur :
Institute of Electrical and Electronics Engineers
Date de publication :
2018-07
ISSN :
1551-3203
Mot(s)-clé(s) en anglais :
Index Terms-Reliability
delay faults
fault detection
fault recovery
real-time
delay faults
fault detection
fault recovery
real-time
Discipline(s) HAL :
Informatique [cs]/Systèmes embarqués
Informatique [cs]/Système multi-agents [cs.MA]
Informatique [cs]/Modélisation et simulation
Informatique [cs]/Langage de programmation [cs.PL]
Informatique [cs]/Génie logiciel [cs.SE]
Informatique [cs]/Système multi-agents [cs.MA]
Informatique [cs]/Modélisation et simulation
Informatique [cs]/Langage de programmation [cs.PL]
Informatique [cs]/Génie logiciel [cs.SE]
Résumé en anglais : [en]
Real-time control systems use controllers that compute and issue setpoints within stringent delay constraints. Failure to do so, due to a crash or delay as a result of software and/or hardware faults, can cause failure of ...
Lire la suite >Real-time control systems use controllers that compute and issue setpoints within stringent delay constraints. Failure to do so, due to a crash or delay as a result of software and/or hardware faults, can cause failure of the controlled resources. Recently, Axo, a protocol for masking crash and delay faults by replicating the controller, was proposed. Axo provides safety by discarding delayed setpoints, and it relies on the presence of valid setpoints for providing availability. To ensure that enough valid setpoints are issued, faulty controller replicas need to be detected and recovered. We present a mechanism for detection and recovery of delay- and crash-faulty replicas under the Axo framework. These mechanisms were designed to be soft state (i.e., their state can be reconstructed from received messages) to enable seamless additions of new replicas. Besides presenting the design, we analytically characterize the time to detect and recover a faulty replica, and we validate them experimentally. We demonstrate the performance of Axo by using two case studies: the first provides a stability analysis of an inverted pendulum system with Axo, and the second shows the fault-tolerance performance of Axo through a deployment on a real-time control system that controls a CIGRÉ low-voltage benchmark microgrid.Lire moins >
Lire la suite >Real-time control systems use controllers that compute and issue setpoints within stringent delay constraints. Failure to do so, due to a crash or delay as a result of software and/or hardware faults, can cause failure of the controlled resources. Recently, Axo, a protocol for masking crash and delay faults by replicating the controller, was proposed. Axo provides safety by discarding delayed setpoints, and it relies on the presence of valid setpoints for providing availability. To ensure that enough valid setpoints are issued, faulty controller replicas need to be detected and recovered. We present a mechanism for detection and recovery of delay- and crash-faulty replicas under the Axo framework. These mechanisms were designed to be soft state (i.e., their state can be reconstructed from received messages) to enable seamless additions of new replicas. Besides presenting the design, we analytically characterize the time to detect and recover a faulty replica, and we validate them experimentally. We demonstrate the performance of Axo by using two case studies: the first provides a stability analysis of an inverted pendulum system with Axo, and the second shows the fault-tolerance performance of Axo through a deployment on a real-time control system that controls a CIGRÉ low-voltage benchmark microgrid.Lire moins >
Langue :
Anglais
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.archives-ouvertes.fr/hal-01846124/document
- Accès libre
- Accéder au document
- https://doi.org/10.1109/tii.2017.2772219
- Accès libre
- Accéder au document
- https://doi.org/10.1109/tii.2017.2772219
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-01846124/document
- Accès libre
- Accéder au document
- https://doi.org/10.1109/tii.2017.2772219
- Accès libre
- Accéder au document
- https://doi.org/10.1109/tii.2017.2772219
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-01846124/document
- Accès libre
- Accéder au document
- https://doi.org/10.1109/tii.2017.2772219
- Accès libre
- Accéder au document
- https://doi.org/10.1109/tii.2017.2772219
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- Axo_TII_preprint.pdf
- Accès libre
- Accéder au document
- tii.2017.2772219
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- Axo_TII_preprint.pdf
- Accès libre
- Accéder au document