System Diagram

Architecture diagram

Design principles

These principles are load-bearing. They explain implementation choices that might otherwise seem arbitrary.

Connectors are the only integration point. Every source format has exactly one connector. Each connector's sole job is to transform its format into a common representation. Everything downstream is format-agnostic - adding a new source type (HL7 v2, DICOM, patient-reported) requires only a new connector.

Provenance is non-negotiable. Every merged entity carries a complete audit trail: which sources contributed, what conflicts existed, how each conflict was resolved, and which layer made the final decision. This is a first-class design constraint, not logging.

Entity resolution escalates, never guesses. The deduplication pipeline runs through four layers of increasing cost. A pair only escalates when the current layer can't decide confidently. The most expensive layer - LLM adjudication - is reserved for genuinely ambiguous cases.

The VFS gives agents a filesystem metaphor. Rather than exposing a graph query interface, the system wraps the resolved graph in a virtual file tree. Agents navigate what's there through progressive directory browsing, without needing to know what queries to issue upfront.

Ingestion is additive, storage is persistent. New sources accumulate on top of existing data. Sending a new encounter or a follow-up lab result does not require re-submitting the full patient history. Each ingest call resolves the new data against what is already in the graph and merges the result in place, keeping the patient record current without discarding anything already consolidated.

Layer 1 - Connect

The Connect layer is the only place where source format matters. Each connector reads its native format and produces a standardized clinical representation - a set of entities, events, relationships, documents, and narrative sections. Everything downstream operates on this contract exclusively.

Connector	Input
FHIR	FHIR R4 Bundle or single resource (JSON)
CDA	HL7 CDA / C-CDA XML
Document	PDFs, images, any unstructured file

Connectors also emit extraction warnings for missing codes, unparseable fields, or ambiguous data. These are surfaced via GET /patients/{patientId}/ingest/status and do not halt ingestion.

Layer 2 - Consolidate

Consolidation takes all connector outputs and produces a single deduplicated, relationship-enriched graph. It runs in three sequential stages.

Stage 1 - Entity resolution

Entity resolution deduplicates clinical entities across all sources. The same condition appearing in a FHIR bundle and two CDA documents produces one node in the graph, not three. See Entity Resolution

Every merge decision produces a provenance trace recording contributing sources, any field-level conflicts, the conflict resolution strategy applied, and which layer made the final call. See Provenance and Auditability.

Stage 2 - Relationship inference

After entities are deduplicated, the relationship resolver infers typed edges using the clinical knowledge base. Explicit references from source data (FHIR reasonReference, CDA entryRelationship) are preserved as-is. Inferred edges are only added when no explicit edge of the same type already exists between a pair. See Relationship Inference for the full node and edge type reference.

Stage 3 - Post-processing

A heuristic pass that fixes common EHR data quality issues that entity resolution alone cannot address:

Conditions with Z80–Z84 ICD-10 codes reclassified from condition to family_history
Conditions with "hx of", "history of", or Z87.x codes have their status set to resolved
Symptoms are linked to their likely parent conditions via symptom_of edges
Care coordination entries and referral records are reclassified to care_plan

Every post-processing action is logged as a PostProcessAction and visible in the reconciliation report (GET /patients/{patientId}/reconciliation). See Review Reconciliation.

Layer 3 - Serve

The Serve layer exposes the consolidated graph through a Virtual File System (VFS): a navigable directory tree that agents browse the same way they browse folders. No files exist on disk. Each path has a resolver function that materializes content from the live graph on demand.

/patient/{id}/
├── conditions/
│   └── active/
│       └── {slug}/
│           ├── _story.md    ← longitudinal condition narrative
│           └── _raw.json    ← structured entity + relationships
├── medications/
│   ├── current/
│   └── discontinued/
├── labs/
│   ├── latest
│   └── trends/
│       └── {loinc-slug}
├── encounters/
│   └── {year}/
├── timeline/
│   └── {year}/
├── allergies/
├── directives/
├── insurance/
├── family_history/
├── sources/
└── memory/

The _story.md file at any condition path is the centerpiece: it assembles medications, monitoring labs, complications, comorbidities, and a progression timeline into a single Markdown document, built lazily from the current graph state on each read.

API surface

Two surfaces expose the VFS and ingest capabilities:

MCP tools (agent-facing)

See MCP Tools.

Tool	Description
`browse_patient(path)`	Directory listing or file preview at a VFS path
`read_patient(path, format?, token_budget?)`	File content in `narrative`, `structured`, or `compact` format
`search_patient(patientId, query)`	BM25 full-text search across all indexed content
`get_patient_info(patientId)`	Patient demographics and pipeline statistics

REST endpoints

See REST API.

Method	Path	Description
`GET`	`/patients`	List all registered patients
`GET`	`/patients/:id`	Patient metadata and pipeline stats
`GET`	`/patients/:id/vfs`	Browse the VFS
`GET`	`/patients/:id/read`	Read a VFS path
`GET`	`/patients/:id/resolution`	Full entity resolution report with provenance
`GET`	`/patients/:id/reconciliation`	Reconciliation summary: inferred relationships + post-process actions
`POST`	`/patients/:id/ingest/fhir`	Ingest a FHIR R4 Bundle
`POST`	`/patients/:id/ingest/cda`	Ingest a CDA / C-CDA XML document
`POST`	`/patients/:id/ingest/document`	Register an unstructured document by metadata
`POST`	`/patients/:id/ingest`	Batch ingest (all three types in one call)
`GET`	`/patients/:id/ingest/status`	Current ingest state and extraction warnings
`POST`	`/patients/:id/ingest/reset`	Clear all patient data
`DELETE`	`/patients/:id`	Remove patient from registry

Architecture

On this page