Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Application Fault Tolerance with Armor Middleware
Kalbarczyk Z., Iyer R., Wang L. IEEE Internet Computing9 (2):28-37,2005.Type:Article
Date Reviewed: Sep 2 2005

Armor, which stands for adaptive reconfigurable mobile objects of reliability, is a middleware architecture that plays in the software-implemented fault tolerance domain. Unlike other solutions to fault tolerance (specifically process replication, which is often expensive for practical use), the Armor middleware uses coordinated multithreaded processes across interconnected nodes to detect, and recover rapidly from, errors in user applications and infrastructure components.

The Armor middleware provides three levels of reliability support to a process. From the least intrusive to the most intrusive, these levels are: level 1 (transparent and external support: detecting and restarting failed application processes); level 2 (transparent extension of standard libraries: requiring the process to link with hardened versions of standard libraries); and level 3 (instrumentation with Armor application programming interfaces (APIs): tightly integrating with the Armor infrastructure through APIs).

The main contribution of the paper is the background information on the Armor infrastructure, which allows the reader to appreciate the different problem domains to which Armor can be applied. Details about the Armor infrastructure itself are provided in other papers by the authors (the authors provide adequate references to these papers for their more interested readers). The authors subsequently present five case studies, drawn from diverse domains. These case studies review the role of Armor in a National Aeronautics and Space Administration (NASA) project using high-performance computing; in improving service availability in a wireless telephone network switch through database audits; in improving the reliability of an in-memory database; in embedded Java to support the monitoring, crash detection, restarting, and check-pointing of Java applications; and finally, in a high-energy physics experiment at Fermi National Accelerator Laboratory.

Overall, the paper presents a persuasive argument for using a nonreplication-oriented fault tolerance strategy. The authors’ results, while very impressive, appear to be conducive to some domains more than others. For instance, the application of Armor to the telephony domain resulted in a 69 percent increase in the average call setup time (from 160 milliseconds to 270 milliseconds). Consider a telephony switch that can handle an offered load of one million calls per hour (in reality, telephone switches can handle more traffic). At the rate of one million calls per hour, the switch has to handle about 277 call attempts per second; thus, it cannot afford to budget 270 milliseconds for a single call. In other case studies listed in the paper, however, Armor provides reliability without the need to resort to process replication.

Reviewer:  Vijay Gurbani Review #: CR131730 (0603-0296)
Bookmark and Share
 
Multiprocessing/ Multiprogramming/ Multitasking (D.4.1 ... )
 
 
Fault-Tolerance (D.4.5 ... )
 
 
Threads (D.4.1 ... )
 
 
Process Management (D.4.1 )
 
Would you recommend this review?
yes
no
Other reviews under "Multiprocessing/Multiprogramming/Multitasking": Date
Algorithms for scheduling homogeneous multiprocessor computers
Ondáš J., Springer-Verlag, London, UK, 1984. Type: Book (9789780387136578)
Aug 1 1985
Parallel programming
Perrott R., Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1987. Type: Book (9789780201142310)
Jul 1 1988
Operating systems: communicating with and controlling the computer
Keller L., Prentice-Hall, Inc., Upper Saddle River, NJ, 1988. Type: Book (9789780136380405)
Sep 1 1989
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy