Key Concepts
Resilience is the ability to maintain essential services under adverse conditions — attacks, failures, unexpected events.
The Resilience Cycle
Recognize (detect threat) → Resist (defend) → Recover (restore) → Adapt (prevent recurrence).
Resilient System Design
Design patterns: redundancy, graceful degradation, bulkheads, circuit breakers, timeouts.
Sociotechnical Resilience
Resilience is not just technical — people, processes, and organizations must also adapt to adverse conditions.
Concept Deep Dives
Click each concept to expand — real examples, diagrams, pros & cons.
The Resilience Cycle
When to Use
Design phase and incident response. Every resilient system goes through this cycle.
Real-World Example
AWS us-east-1 outage 2021: teams that had multi-region failover (Recognize→Resist→Recover) kept serving users.
✓ Advantages
- Systematic framework
- Covers before + during + after failure
- Drives architecture decisions
⚠ Watch Out
- Expensive to implement fully
- Requires runbooks and practice
Resilient System Design
When to Use
Any system with uptime requirements — especially microservices and distributed systems.
Real-World Example
Netflix circuit breaker: if Recommendations service is slow, show cached/empty instead of timing out the whole page.
✓ Advantages
- Prevents cascade failures
- Graceful degradation beats total failure
- Self-healing systems
⚠ Watch Out
- Complex to implement and test
- Harder to debug
Sociotechnical Resilience
When to Use
Real incident response — technical resilience fails without trained teams and clear processes.
Real-World Example
Boeing 737 MAX: the technical system had indicators, but organizational factors (training, communication, culture) caused the tragedy.
✓ Advantages
- Addresses real root causes
- Improves incident response culture
- Builds institutional knowledge
⚠ Watch Out
- Harder to 'fix' than technical issues
- Requires cultural change
Quick Reference
- 1Resilience: ability to maintain essential services despite adverse conditions.
- 24-stage resilience cycle: recognize, resist, recover, adapt.
- 3Resilient design patterns: redundancy, circuit breakers, bulkheads, graceful degradation.
- 4Sociotechnical resilience: both technical AND organizational/human factors matter.
- 5Chaos engineering validates resilience — passive hope is not enough.
- 6Blameless postmortems drive adaptation — learn from failures to prevent recurrence.
Quiz — Test Yourself
Think through your answer first, then reveal.