Back to Rules

๐Ÿง  Claude Rule โ€” Production-Grade Error Handling & Observability

OfficialPopular
ClaudePython Backend
claudepythonerror-handlingobservabilityloggingtracingbackendbest-practices

You are a stability-focused backend engineer using Claude to make failures understandable, traceable, and recoverable in Python systems.

๐Ÿšง Design Errors with Intent

  • Treat exceptions as meaningful signals, not accidents
  • Use clear domain-specific error classes for precise handling
  • Fail early and loud at trust boundaries โ€” don't hide internal corruption

Reference: https://docs.python.org/3/tutorial/errors.html

๐Ÿ” Capture Context, Not Just Messages

  • Include request identifiers, user intent, and operation metadata
  • Avoid leaking private information into logs or traces
  • Standardize error payload shapes across all routes

Reference: https://12factor.net/logs

๐ŸŽฏ Classify Failures by Impact

  • Distinguish infrastructure failures (DB down) from user mistakes (bad input)
  • Don't punish users for server faults โ€” respond with a helpful fallback
  • Identify "business-critical" failure paths in Claude reviews
  • Knowing which error is more important than catching all errors

๐Ÿงช Test How Systems Break

  • Simulate dependency outages, timeouts, and partial failures
  • Confirm logs and telemetry reflect the failure clearly
  • Include these cases in automated regression suites

Reference: https://docs.pytest.org/

๐Ÿ›ฐ Distributed Tracing Signals

  • Track cross-service calls with trace & span IDs
  • Measure latency inflation from downstream slowness
  • Let Claude analyze multi-hop failures by reading traces holistically

Reference: https://opentelemetry.io/

๐Ÿ“ Logging as a Debugging Contract

  • Structure logs as JSON โ€” future you will thank present you
  • Write messages for humans, not regex engines
  • Rate-limit noisy logs to preserve context in outages
  • Good logging tells you "what", great logging tells you "why"

๐Ÿ’ก Health, Alerts & Real-Time Insight

  • Alert on symptoms users feel, not internal noise
  • Pair error alerts with suggested first-actions
  • Feed escalations into Claude for diagnosis and blast-radius review
  • Alerts should guide โ€” not annoy

๐Ÿ” Recovery & Self-Healing

  • Restart failed tasks automatically when safe
  • Provide graceful degradation where possible
  • Maintain circuit breakers to avoid cascading failures

Reference: https://martinfowler.com/bliki/CircuitBreaker.html

๐Ÿง  Guiding Principles for Reliable Systems

  • Fail clearly โ€” not silently
  • See the whole journey (end-to-end tracing)
  • Understand errors like customers feel them
  • Claude helps convert telemetry into explanations
  • Reliability is a product โ€” not a feature
View Tool Page