Skip to Content
OperationsMonitoring

Monitoring

Observability and alerting for platform health.

Key Metrics

Ingestion Health

  • Messages per second
  • Error rate
  • Latency percentiles
  • Queue depth

Processing Health

  • Throughput
  • Processing latency
  • Error rate
  • Backlog size

Symptoms Engine Health

  • Active symptoms count
  • Detection latency
  • State changes per second
  • Memory utilization

API Health

  • Request rate
  • Response latency
  • Error rate
  • Active connections

Dashboards

Pre-built monitoring dashboards:

  • System overview
  • Ingestion monitoring
  • Processing pipeline
  • Symptoms engine
  • API performance

Alerting

Platform alerts for:

  • Error rate thresholds
  • Latency degradation
  • Resource exhaustion
  • Component failures

Integration

Export metrics to:

  • Prometheus
  • Datadog
  • CloudWatch
  • Custom endpoints

Health Endpoints

Health check endpoints for:

  • Load balancer probes
  • Orchestration health
  • Dependency checks
Last updated on