PromptsVault AI is thinking...
Searching the best prompts from our community
Searching the best prompts from our community
Prompts matching the #resilience-testing tag
Implement chaos engineering practices for system resilience testing and failure mode discovery through controlled experiments. Chaos engineering principles: 1. Hypothesis formation: define steady state behavior, predict impact of injected failures. 2. Controlled experiments: gradual scope increase, production-like environments, safety measures. 3. Minimal blast radius: limit failure scope, immediate rollback capability, monitoring safeguards. 4. Continuous practice: regular chaos days, automated experiments, team learning culture. Failure injection types: 1. Infrastructure chaos: server termination, network partitions, disk space exhaustion. 2. Application chaos: service unavailability, increased latency, memory pressure, CPU throttling. 3. Network chaos: packet loss, bandwidth limitations, DNS failures, certificate expiration. Tools and platforms: 1. Chaos Monkey: random instance termination, AWS integration, configurable schedules. 2. Gremlin: comprehensive failure injection, team collaboration, hypothesis tracking. 3. Litmus: Kubernetes-native chaos engineering, workflow automation, GitOps integration. 4. Pumba: Docker container chaos, network emulation, stress testing. Experiment design: 1. Baseline measurement: performance metrics, error rates, user experience indicators. 2. Hypothesis definition: expected system behavior, acceptable degradation levels. 3. Metrics collection: SLI monitoring, error budgets, customer impact assessment. Safety measures: 1. Circuit breakers: automatic experiment termination, blast radius containment. 2. Monitoring: real-time alerting, anomaly detection, automated rollback triggers. Learning integration: postmortem analysis, system improvement recommendations, resilience scoring, team knowledge sharing, incident response improvement.