Resilience and Fault tolerance
Creating a resilient system places requirements on all of the services in it. As mentioned earlier, the dynamic nature of cloud environments demands that services be written to expect and respond gracefully to the unexpected. This could mean receiving bad data, being unable to reach a required backing service, or dealing with conflicts due to concurrent updates in a distributed system.
With respect to independently evolving but inter-dependent services, the robustness principle provides the best guidance: “Be liberal in what you accept, and conservative in what you send”9. Assume that APIs will evolve over time, be tolerant of data you do not understand. To quote the RFC:
As a simple example, consider a protocol specification that contains an enumeration of values ... this enumeration must be assumed to be incomplete. Thus, if a protocol specification defines four possible error codes, the software must not break when a fifth code shows up. An undefined code might be logged..., but it must not cause a failure.
Microservices must also prevent failures from cascading through the system. Chapter 2, “Creating Microservices in Java” on page 9 will discuss strategies for dealing with API changes. Isolation patterns like circuit breakers and bulkheads are discussed in Chapter 3, “Locating services” on page 25.