Computing Reviews

Why SRE documents matter
Nukala S., Rau V. Communications of the ACM61(12):45-51,2018.Type:Article
Date Reviewed: 03/04/19

This article is about ensuring good documentation to help site reliability engineering (SRE) teams function effectively, scale seamlessly to accommodate new services, and ensure that web products and services run reliably. It focuses on moving from one-off team member skill-based performance to a process that relies on high-quality documentation for teams to scale up and execute efficiently and reliably.

There are several SRE tasks, and the article outlines a set of documents that need to be adequately maintained for effectively accomplishing each of these tasks. These include: i) “documents for new service onboarding,” which include templates for architecture and dependencies, capacity planning, failure modes, process and automation, and external dependencies; ii) “documents for running a service,” which include documents for service overviews, playbooks, postmortem, policies, and service-level agreements (SLAs), and a subset of documents for production products such as an about page, codelabs, frequently asked questions (FAQs), support, application programming interface (API) references, and concept, how-to, and developer guides; iii) “documents for reporting service states,” such as periodic service reviews and best practices reviews; iv) “documents for running SRE teams,” such as team sites and team charters; v) “documents for new SRE onboarding,” such as repository access and management; and vi) “documents for service decommissioning.”

The article is hence a holistic description of a set of documents that need to be created, maintained, and used over the life cycle of services or products (especially web products, as addressed here) to enable and ensure efficiency and reliability. It is a good reference for those interested in joining the SRE profession, as well as for project management and system quality professionals.

Reviewer:  Srini Ramaswamy Review #: CR146457 (1905-0176)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy