Monitoring
Observability and alerting for platform health.
Key Metrics
Ingestion Health
- Messages per second
- Error rate
- Latency percentiles
- Queue depth
Processing Health
- Throughput
- Processing latency
- Error rate
- Backlog size
Symptoms Engine Health
- Active symptoms count
- Detection latency
- State changes per second
- Memory utilization
API Health
- Request rate
- Response latency
- Error rate
- Active connections
Dashboards
Pre-built monitoring dashboards:
- System overview
- Ingestion monitoring
- Processing pipeline
- Symptoms engine
- API performance
Alerting
Platform alerts for:
- Error rate thresholds
- Latency degradation
- Resource exhaustion
- Component failures
Integration
Export metrics to:
- Prometheus
- Datadog
- CloudWatch
- Custom endpoints
Health Endpoints
Health check endpoints for:
- Load balancer probes
- Orchestration health
- Dependency checks
Last updated on