Computing Reviews, the leading online review service for computing literature.

Search

Abstracting the geniuses away from failure testing
Alvaro P., Tymon S. Communications of the ACM61 (1):54-61,2018.Type:Article

Date Reviewed: May 9 2018

Large-scale distributed systems are difficult to test using traditional failure-testing or fault-injection techniques. Even recent approaches such as chaos engineering rely on experienced experts who can observe the system, propose hypotheses of its behavior, and formulate experiments to validate the results of variations. The process assumes the availability of human expertise, a formal specification, and the source code. This article presents a lineage-driven fault injection (LDFI) approach that automates the process, starting with successful outcomes and reasoning backward through call-graph traces and data provenance. It was successfully applied at Netflix. I strongly recommend this excellent introductory article if you are new to chaos engineering. It gives enlightening ideas to novices. The writing is smooth and interesting. If you are a practicing software tester, however, you may want more than just bedtime reading. For example, in order to apply LDFI, we still need an executable specification and a correctness specification, including invariant definitions. The invariants are based on homeostatic states, which are often mistaken as steady states in chaos engineering literature. The former refers to a relatively stable state of equilibrium such as our body temperature of 37 degrees Celsius under normal circumstances, whereas the latter refers to an unvarying condition such as our bodies at room temperature after death. Furthermore, we need to work around nonreplayability and nondeterminism in real-life distributed systems. I suggest that readers refer to Rosenthal et al. [1] for precise details of chaos engineering and Alvaro et al. [2] for technical assumptions and consequences of LDFI.

Reviewer: T.H. Tse	Review #: CR146024 (1807-0384)

1)	Rosenthal, C.; Hochstein, L.; Blohowiak, A.; Jones, N.; Basiri, A. Chaos engineering: building confidence in system behavior through experiments. O'Reilly, Sebastopol, CA, 2017.

2)	Alvaro, P.; Andrus, K.; Sanden, C.; Rosenthal, C.; Basiri, A.; Hochstein, L. Automating failure testing research at Internet scale. In Proc. of the 7th ACM Symposium on Cloud Computing (SoCC '16). ACM, New York, NY, 2016, 17–28.

Testing And Debugging (D.2.5 )

Would you recommend this review?

yes

Other reviews under "Testing And Debugging":	Date

Software defect removal Dunn R., McGraw-Hill, Inc., New York, NY, 1984. Type: Book (9789780070183131)	Mar 1 1985

On the optimum checkpoint selection problem Toueg S., Babaoglu O. SIAM Journal on Computing 13(3): 630-649, 1984. Type: Article	Mar 1 1985

Software testing management Royer T., Prentice-Hall, Inc., Upper Saddle River, NJ, 1993. Type: Book (9780135329870)	Mar 1 1994

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy