PromptsVault AI is thinking...
Searching the best prompts from our community
Searching the best prompts from our community
Prompts matching the #alerting tag
Build comprehensive monitoring and observability infrastructure for production systems. Monitoring stack architecture: 1. Metrics: Prometheus for collection, Grafana for visualization, 15-second scrape intervals. 2. Logging: ELK Stack (Elasticsearch, Logstash, Kibana) or EFK (Fluentd instead of Logstash). 3. Tracing: Jaeger for distributed tracing, OpenTelemetry for instrumentation. 4. Alerting: AlertManager for routing, PagerDuty for escalation. Key metrics to monitor: 1. Infrastructure: CPU (>80% alert), memory (>85%), disk space (>90%), network I/O. 2. Application: response time (<200ms target), error rate (<0.1%), throughput (requests/second). 3. Business: user signups, conversion rates, revenue metrics, feature usage. Alerting best practices: 1. Alert fatigue prevention: meaningful alerts only, proper severity levels (critical/warning/info). 2. Runbook automation: automated remediation for common issues, escalation procedures. 3. On-call rotation: 7-day rotations, primary/secondary coverage, fair distribution. Dashboard design: 1. Golden signals: latency, traffic, errors, saturation for each service. 2. SLA monitoring: 99.9% uptime target, error budget tracking, service level indicators. Log management: structured logging (JSON), log retention policies (90 days), centralized aggregation with filtering.