Provenance and Auditability
How Patient Memory records where every clinical fact came from, what conflicts existed between sources, and how each conflict was resolved.
What it is
Clinical AI systems that aggregate patient data across sources face a fundamental question: why does the chart show this value? A merged entity might draw from three different documents with three different opinions on the patient's medication dose. The system has to pick one, and that decision must be traceable.
Provenance is a first-class design constraint in Patient Memory. Every resolved entity carries a complete audit trail from the moment it enters the graph.
The ProvenanceTrace
Every ClinicalEntity in the resolved graph carries a ProvenanceTrace:
{
"sources": [
{ "type": "fhir", "origin": "FHIR Bundle", "reliability": 0.85, "ref": "Condition/abc" },
{
"type": "cda",
"origin": "Summary_20230907.xml",
"reliability": 0.8,
"ref": "observation/456"
}
],
"conflicts": [
{
"field": "status",
"values": ["active", "resolved"],
"resolution": "Selected 'active' from FHIR Bundle: more recent source (2023-09-07 vs 2021-04-12)."
}
],
"resolvedBy": "deterministic-code",
"reasoning": "Both entities share SNOMED code 44054006.",
"confidence": 0.9
}sources
Each SourceRef in sources represents one contributing record. A single-element array means the entity appeared in only one source and was never merged with another. Multi-element arrays represent deduplicated entities.
| Field | What it records |
|---|---|
type | Which connector produced this source (fhir, cda, document) |
origin | Human-readable label for the source document |
reliability | Weight (0–1) given to this source when resolving conflicts. |
ref | The original resource reference within the source, enabling exact lookup in the source file |
conflicts
When contributing sources disagree on a field value, a ConflictRecord is written. Every conflict record includes the raw values from each source and an explanation of which value was chosen and why.
An empty conflicts array means all sources agreed on all field values.
resolvedBy
Records which layer of the 4-layer cascade made the final merge or no-merge decision.
| Value | What happened |
|---|---|
deterministic-code | Entities share a standard code. Cheapest, most certain |
nlp-normalization | Same concept, different text or minor variation |
embedding-similarity | Semantically similar, no shared codes |
llm-adjudication | Ambiguous case decided by an LLM with clinical context |
no-merge | Entity was not merged with any other. Treated as unique |
When resolvedBy is llm-adjudication, the reasoning field contains the LLM's natural-language rationale for its decision.
Conflict resolution strategies
When sources disagree, the pipeline applies one of these strategies:
| Strategy | When used |
|---|---|
corroboration | The majority of sources agree on one value. Pick the majority value |
recency | No majority. Prefer the value from the most recently dated source |
reliability | Same date or no date. Weight by source reliability score |
llm-judgment | Complex conflict where all simpler strategies are inconclusive |
The strategy used and the outcome are recorded in every ConflictRecord. This means the audit trail answers not just what was chosen but by what rule and from which source.
Confidence scores
Every entity has a confidence score (0–1) derived from two inputs:
- The reliability scores of its contributing sources
- The confidence of the resolution layer that merged it
Higher layer numbers produce lower confidence: a pair resolved by deterministic code matching (Layer 1) scores higher than one resolved by LLM adjudication (Layer 4). A single-source entity inherits only the reliability of its one source.
Confidence scores propagate to inferred relationships. For example, a prescribed_for edge from the clinical knowledge base is only emitted at confidence 0.90 (strong), 0.75 (moderate), or 0.60 (weak), depending on the knowledge base entry.
Accessing provenance
Full resolution report
GET /patients/{patientId}/resolution returns every entity in the graph with its complete ProvenanceTrace. Use this to:
- Verify that two records were (or were not) merged and understand why
- See all field-level conflicts and how each was resolved
- Identify which source "won" for a given field value
Reconciliation summary
GET /patients/{patientId}/reconciliation returns a higher-level summary: merge counts, the complete list of inferred relationships, and all post-processing actions. Use this to understand what the pipeline did overall rather than for a specific entity.
See Review the Reconciliation Summary.
Per-entity raw file
Every entity in the VFS has a _raw.json file at its slug path. This includes the entity's full field set, all its clinical codes, and its typed relationships, but not the full ProvenanceTrace. For provenance detail, use the resolution report.
Design rationale
Provenance is recorded at write time, not reconstructed later. Every merge decision writes its audit trail into the entity as the pipeline runs. This means the resolution report is an O(1) read. There is no post-hoc reconstruction from logs.
The design also means that if the same patient is re-ingested with a corrected data source, the provenance trace reflects the new state after the next pipeline run, not the prior one. Provenance represents the current resolved state, not a history of pipeline runs.
Relationship Inference
How Patient Memory constructs a typed graph of clinical relationships between deduplicated entities, and what nodes and edges that graph contains.
Clinical Knowledge Base
How Patient Memory uses a curated vocabulary to infer relationships between conditions, medications, and labs that are not present in the source data.