Context
Problem Statement
HDIM clinical services needed to maintain immutable audit trails, enable temporal queries (state at any point in time), support event replay for data corrections, and provide complete forensic capability for healthcare quality measures. Traditional CRUD patterns with UPDATE/DELETE statements make it difficult to answer "what happened?" questions and reconstruct historical state.
Specific challenges identified:
Background
January 2026 context:
Previous attempts:
Assumptions
Options Considered
Option 1: Event Sourcing with CQRS (Eventual Consistency Read Models)
Description: Implement complete event sourcing pattern where all state changes are persisted as immutable events. Use separate read models (projections) built from events via event handlers, enabling decoupling of write and read paths.
Architecture:
Write Path: Command → Service → Event → Event Store Read Path: Query → Read Model (Projection) Synchronization: Kafka topic → Event Handler → Update Projection
Pros:
Cons:
Estimated Effort: 8 weeks sequential, 2 weeks parallel (TDD Swarm)
Risk Level: Medium (new pattern, but proven in industry)
Option 2: CRUD + Manual Audit Tables
Description: Use traditional Spring Data JPA with manual audit table creation and triggers for historical tracking.
Architecture:
Patient → JPA Entity patient_audit → PostgreSQL trigger on UPDATE/DELETE
Pros:
Cons:
Estimated Effort: 4 weeks sequential
Risk Level: High (fragile, inconsistent, compliance risk)
Option 3: Hybrid Approach (Event Sourcing for New Services Only)
Description: Adopt event sourcing for new clinical event services while keeping existing CRUD services unchanged.
Architecture:
Existing services: Traditional CRUD (patient-service, quality-measure-service) New services: Event Sourcing (patient-event-service, quality-measure-event-service)
Pros:
Cons:
Estimated Effort: 4 weeks (but partial solution)
Risk Level: Medium (inconsistency risk)
Decision
Selected Option
We chose Option 1 (Event Sourcing with CQRS and eventual consistency) because:
Rationale
Event Sourcing aligns with HDIM's healthcare mission where immutable records and complete audit trails are non-negotiable requirements. While more complex than CRUD, the pattern:
The TDD Swarm approach (Phase 2 adoption) enables parallel team execution, reducing 8-week sequential timeline to 2 weeks with 4 teams working in parallel on different layers.
Consequences
Positive
Short-term (1-2 months):
Long-term (3-12 months):
Metrics (Phase 5 Results):
Negative
Short-term:
Long-term:
Neutral
Process changes:
Implementation
Affected Components
New Services Created:
Associated Components:
Database Changes:
Kafka Topics:
Timeline
| Phase | Milestone | Duration | Owner | Status |
|-------|-----------|----------|-------|--------|
| Phase 5.1 | Patient Event Service (events + handler + projections) | 1 week | Team 5.1 | ✅ Completed |
| Phase 5.2 | Quality Measure Event Service (events + handler + projections) | 1 week | Team 5.2 | ✅ Completed |
| Phase 5.3 | Care Gap Event Service (events + handler + projections) | 1 week | Team 5.3 | ✅ Completed |
| Phase 5.4 | Clinical Workflow Event Service (events + handler + projections) | 1 week | Team 5.4 | ✅ Completed |
| Integration | End-to-end testing, Kafka validation, projection consistency | 1 week | QA | ✅ Completed |
| Deployment | Staging validation, production deployment | 1 week | DevOps | ✅ Completed |
Total: 6 weeks with TDD Swarm parallel execution (vs 8 weeks sequential)
Success Criteria
Rollback Plan
Condition for rollback: Event sourcing pattern causes production incident affecting >5% of users or audit requirements cannot be met
Steps to rollback:
Effort estimate: 2-3 days (with prepared rollback scripts)
Monitoring & Validation
Metrics to Track
| Metric | Baseline | Target | Cadence | Current |
|--------|----------|--------|---------|---------|
| Event ingestion rate | 0 | 1000+/min per service | Real-time | 800-1200/min |
| Event store size | 0 | <10GB per service | Daily | 2-4GB per service |
| Projection update lag | 0 | <5 seconds (p99) | Real-time | 1-3 seconds |
| Event replay time | N/A | <2 minutes per 1M events | Per run | 1.5 min/1M |
| Measure recalculation accuracy | 0 | 100% match to original | Per incident | 100% |
| Event handler error rate | 0 | <0.1% | Hourly | 0.02% |
| Projection consistency check | N/A | 100% consistent | Daily | 100% |
| Test coverage (event services) | 0 | 90%+ | Per build | 92% |
Review Schedule
Related Decisions
Prior Decisions
Future Decisions Enabled
Examples & Precedents
Industry Examples
Similar HDIM Decisions
Questions & Open Items
Resolved Questions
Q: Will event sourcing slow down writes?
A: No. Events are simple inserts (no joins), often faster than ORM updates.
Q: What if event replay takes too long?
A: Pre-compute aggregates, implement caching, use event snapshots for old events.
Q: How do we handle event schema changes?
A: Event versioning strategy documented in EVENT_SOURCING_ARCHITECTURE.md
Q: Can we query events efficiently?
A: Yes, with proper indexing on timestamp and event type. Projections provide optimized queries.
Open Questions
Approvals
Decision Makers
| Role | Name | Date | Status |
|------|------|------|--------|
| Architecture Lead | HDIM Platform Team | 2026-01-19 | ✅ Accepted |
| Platform Lead | Phase 5 Leadership | 2026-01-19 | ✅ Accepted |
| Tech Lead (Backend) | Platform Engineering | 2026-01-19 | ✅ Accepted |
Stakeholder Feedback
Changelog
| Date | Author | Change |
|------|--------|--------|
| 2026-01-19 | Platform Team | Created ADR-001 formalizing event sourcing decision |
| 2026-01-12 | Architecture Lead | Reviewed against healthcare compliance requirements |
| 2026-01-10 | Platform Team | Initial draft based on Phase 5 implementation |
References
Documentation Links
Related ADRs
External References
Footer
ADR #: 001
Version: 1.0
Last Updated: 2026-01-19
Supersedes: None (initial decision)
Superseded By: None (current)
_Created: January 19, 2026_
_Based on: Phase 5 Event Services Implementation (Oct 2025 - Jan 2026)_
_Status: Active and Validated in Production_