Context
Problem Statement
HDIM had 51 microservices communicating via HTTP, Kafka, and direct calls. Tracing a single user request across all services was impossible without manual log aggregation. Needed automatic trace propagation and visualization.
Options Considered
Option 1: OpenTelemetry (Industry Standard)
Description: Implement OpenTelemetry with Jaeger visualization for distributed tracing
Pros:
Cons:
Risk Level: Low (proven, open standard)
Option 2: Splunk/DataDog (Managed)
Description: Use managed observability platform
Pros:
Cons:
Risk Level: Medium (cost, vendor lock-in)
Decision
We chose Option 1 (OpenTelemetry + Jaeger) because:
Implementation
Configuration
management:
tracing:
sampling:
probability: 1.0 # 100% dev, 0.5 staging, 0.1 prodSampling Rates
| Environment | Rate | Reason |
|-------------|------|--------|
| Development | 100% | Full visibility for debugging |
| Staging | 50% | Balance visibility + cost |
| Production | 10% | Cost-efficient monitoring |
Kafka Integration
spring:
kafka:
producer:
properties:
interceptor.classes: com.healthdata.tracing.KafkaProducerTraceInterceptor
consumer:
properties:
interceptor.classes: com.healthdata.tracing.KafkaConsumerTraceInterceptorJaeger UI
Access at: http://localhost:16686
Success Criteria
References
Footer
ADR #: 008
Version: 1.0
Status: Active and Deployed
Last Updated: 2026-01-19
_Decision Date: Phase 5 (January 2026)_
_Visualization: Jaeger UI (http://localhost:16686)_