Context
Problem Statement
HDIM had a single monolithic Kong/Spring gateway service handling all API requests (general-purpose, admin, clinical, FHIR) with duplicated authentication logic, mixed domain concerns, and inability to apply domain-specific optimizations. This created code duplication, poor scalability, and made it difficult to apply specialized security policies or performance tuning per domain.
Specific challenges identified:
Background
January 2026 context:
Previous approaches:
Assumptions
Options Considered
Option 1: Modularized 4-Gateway Architecture with Shared Core
Description: Split monolithic gateway into 4 specialized services (admin, clinical, FHIR, general) using a shared gateway-core module containing common functionality (authentication, rate limiting, logging).
Architecture:
gateway-core (shared module) ├── TrustedHeaderAuthFilter ├── RateLimitingFilter ├── AuditLoggingFilter └── CorsFilter gateway-admin-service → gateway-core gateway-clinical-service → gateway-core gateway-fhir-service → gateway-core gateway-service → gateway-core (legacy)
Pros:
Cons:
Estimated Effort: 2 weeks (with TDD Swarm parallel execution)
Risk Level: Low-Medium (modularization is well-established pattern)
Option 2: Monolithic Gateway with Domain-Based Route Configuration
Description: Keep single gateway but implement route-based domain separation with configuration-driven policies.
Architecture:
gateway-service (monolithic) ├── /admin/* → Apply admin policies (config-driven) ├── /clinical/* → Apply clinical policies (config-driven) ├── /fhir/* → Apply FHIR policies (config-driven) └── /* → Apply general policies (config-driven)
Pros:
Cons:
Estimated Effort: 1 week
Risk Level: Medium (scaling/performance risk long-term)
Option 3: Separate Gateways Without Shared Code
Description: Create 4 independent gateway services with duplicated code (copy-paste approach).
Architecture:
gateway-admin-service (independent) gateway-clinical-service (independent) gateway-fhir-service (independent) gateway-service (independent, legacy)
Pros:
Cons:
Estimated Effort: 3 weeks (due to duplication and sync issues)
Risk Level: High (maintenance and consistency risk)
Decision
Selected Option
We chose Option 1 (Modularized 4-Gateway Architecture with Shared gateway-core Module) because:
Rationale
The core insight is that different API domains have different requirements:
| Domain | Traffic Pattern | Auth Strictness | Optimization | Use Case |
|--------|-----------------|-----------------|--------------|----------|
| Admin | Low, bursty | High (MFA preferred) | Latency-sensitive | Tenant config, approvals |
| Clinical | High, sustained | Medium (fast auth) | Throughput-optimized | Patient data, measures |
| FHIR | Medium, variable | Medium (HL7 compliance) | Standards-focused | EHR integration |
| General | Legacy traffic | Standard | Backwards-compatible | Fallback routing |
Trying to handle all of these with one gateway creates tradeoffs that hurt all domains. Specialization enables each to be optimized for its needs, while shared code (gateway-core) ensures consistent, secure foundations.
Consequences
Positive
Short-term (1-2 months):
Long-term (3-12 months):
Metrics:
Negative
Short-term:
Long-term:
Neutral
Process changes:
Implementation
Affected Components
New Services Created:
Shared Module Created:
Files Affected:
Timeline
| Phase | Milestone | Duration | Owner | Status |
|-------|-----------|----------|-------|--------|
| Phase 1 | Extract gateway-core from monolithic gateway | 3 days | Platform Team | ✅ Completed |
| Phase 2 | Create gateway-admin-service | 2 days | Admin Team | ✅ Completed |
| Phase 3 | Create gateway-clinical-service | 2 days | Clinical Team | ✅ Completed |
| Phase 4 | Create gateway-fhir-service | 2 days | FHIR Team | ✅ Completed |
| Phase 5 | Integration testing (routing, auth) | 3 days | QA | ✅ Completed |
| Phase 6 | Deployment and traffic migration | 2 days | DevOps | ✅ Completed |
Total: 2 weeks with parallel team execution
Success Criteria
Rollback Plan
Condition for rollback: Modularized gateways cause production incident affecting >5% of requests or cause performance regression >20%
Steps to rollback:
Effort estimate: 2-4 hours
Monitoring & Validation
Metrics to Track
| Metric | Baseline | Target | Cadence | Current |
|--------|----------|--------|---------|---------|
| Code duplication (lines) | 2000+ | 0 | Per commit | 0 lines |
| Security patch time | 3 days | <1 day | Per CVE | <1 day |
| Gateway latency (p99) | 150ms | <150ms | Continuous | 120-140ms |
| Clinical throughput (req/sec) | 800 | 2000+ | Continuous | 1800-2100 |
| Admin gateway latency | 150ms | <100ms | Continuous | 85-95ms |
| FHIR compliance checks | N/A | 100% | Per request | 100% |
| Core module test coverage | 0% | 90%+ | Per build | 92% |
| Per-domain latency tracking | None | All 4 tracked | Continuous | ✅ Enabled |
Review Schedule
Related Decisions
Prior Decisions
Future Decisions Enabled
Examples & Precedents
Industry Examples
Similar HDIM Decisions
Questions & Open Items
Resolved Questions
Q: How do clients know which gateway to use?
A: Routes mapped in load balancer / service mesh (Consul, Kubernetes). Also documented in API catalog.
Q: What if gateway-core changes break one specialized gateway?
A: Each gateway has its own test suite, CI/CD catches issues before deployment.
Q: Won't this create more operational complexity?
A: Yes, but offset by easier scaling and reduced maintenance.
Q: Can we roll out core module changes to gateways independently?
A: Yes, if backwards compatible. If breaking, all gateways must upgrade together.
Open Questions
Approvals
Decision Makers
| Role | Name | Date | Status |
|------|------|------|--------|
| Architecture Lead | HDIM Platform Team | 2026-01-19 | ✅ Accepted |
| Gateway Lead | Gateway Service Team | 2026-01-19 | ✅ Accepted |
| Tech Lead (Backend) | Platform Engineering | 2026-01-19 | ✅ Accepted |
Stakeholder Feedback
Changelog
| Date | Author | Change |
|------|--------|--------|
| 2026-01-19 | Platform Team | Created ADR-002 formalizing gateway modularization |
| 2026-01-12 | Architecture Lead | Reviewed code duplication analysis |
| 2026-01-10 | Platform Team | Initial draft based on Phase 5 implementation |
References
Documentation Links
Related ADRs
External References
Footer
ADR #: 002
Version: 1.0
Last Updated: 2026-01-19
Supersedes: None (initial decision)
Superseded By: None (current)
_Created: January 19, 2026_
_Based on: Phase 5 Implementation (Oct 2025 - Jan 2026)_
_Status: Active and Deployed in Production_