Clinia
Concepts

Provenance and Auditability

How Patient Memory records where every clinical fact came from, what conflicts existed between sources, and how each conflict was resolved.

What it is

Clinical AI systems that aggregate patient data across sources face a fundamental question: why does the chart show this value? A merged entity might draw from three different documents with three different opinions on the patient's medication dose. The system has to pick one, and that decision must be traceable.

Provenance is a first-class design constraint in Patient Memory. Every resolved entity carries a complete audit trail from the moment it enters the graph.

The ProvenanceTrace

Every ClinicalEntity in the resolved graph carries a ProvenanceTrace:

{
  "sources": [
    { "type": "fhir", "origin": "FHIR Bundle", "reliability": 0.85, "ref": "Condition/abc" },
    {
      "type": "cda",
      "origin": "Summary_20230907.xml",
      "reliability": 0.8,
      "ref": "observation/456"
    }
  ],
  "conflicts": [
    {
      "field": "status",
      "values": ["active", "resolved"],
      "resolution": "Selected 'active' from FHIR Bundle: more recent source (2023-09-07 vs 2021-04-12)."
    }
  ],
  "resolvedBy": "deterministic-code",
  "reasoning": "Both entities share SNOMED code 44054006.",
  "confidence": 0.9
}

sources

Each SourceRef in sources represents one contributing record. A single-element array means the entity appeared in only one source and was never merged with another. Multi-element arrays represent deduplicated entities.

FieldWhat it records
typeWhich connector produced this source (fhir, cda, document)
originHuman-readable label for the source document
reliabilityWeight (0–1) given to this source when resolving conflicts.
refThe original resource reference within the source, enabling exact lookup in the source file

conflicts

When contributing sources disagree on a field value, a ConflictRecord is written. Every conflict record includes the raw values from each source and an explanation of which value was chosen and why.

An empty conflicts array means all sources agreed on all field values.

resolvedBy

Records which layer of the 4-layer cascade made the final merge or no-merge decision.

ValueWhat happened
deterministic-codeEntities share a standard code. Cheapest, most certain
nlp-normalizationSame concept, different text or minor variation
embedding-similaritySemantically similar, no shared codes
llm-adjudicationAmbiguous case decided by an LLM with clinical context
no-mergeEntity was not merged with any other. Treated as unique

When resolvedBy is llm-adjudication, the reasoning field contains the LLM's natural-language rationale for its decision.

Conflict resolution strategies

When sources disagree, the pipeline applies one of these strategies:

StrategyWhen used
corroborationThe majority of sources agree on one value. Pick the majority value
recencyNo majority. Prefer the value from the most recently dated source
reliabilitySame date or no date. Weight by source reliability score
llm-judgmentComplex conflict where all simpler strategies are inconclusive

The strategy used and the outcome are recorded in every ConflictRecord. This means the audit trail answers not just what was chosen but by what rule and from which source.

Confidence scores

Every entity has a confidence score (0–1) derived from two inputs:

  • The reliability scores of its contributing sources
  • The confidence of the resolution layer that merged it

Higher layer numbers produce lower confidence: a pair resolved by deterministic code matching (Layer 1) scores higher than one resolved by LLM adjudication (Layer 4). A single-source entity inherits only the reliability of its one source.

Confidence scores propagate to inferred relationships. For example, a prescribed_for edge from the clinical knowledge base is only emitted at confidence 0.90 (strong), 0.75 (moderate), or 0.60 (weak), depending on the knowledge base entry.

Accessing provenance

Full resolution report

GET /patients/{patientId}/resolution returns every entity in the graph with its complete ProvenanceTrace. Use this to:

  • Verify that two records were (or were not) merged and understand why
  • See all field-level conflicts and how each was resolved
  • Identify which source "won" for a given field value

See Audit Entity Resolution.

Reconciliation summary

GET /patients/{patientId}/reconciliation returns a higher-level summary: merge counts, the complete list of inferred relationships, and all post-processing actions. Use this to understand what the pipeline did overall rather than for a specific entity.

See Review the Reconciliation Summary.

Per-entity raw file

Every entity in the VFS has a _raw.json file at its slug path. This includes the entity's full field set, all its clinical codes, and its typed relationships, but not the full ProvenanceTrace. For provenance detail, use the resolution report.

Design rationale

Provenance is recorded at write time, not reconstructed later. Every merge decision writes its audit trail into the entity as the pipeline runs. This means the resolution report is an O(1) read. There is no post-hoc reconstruction from logs.

The design also means that if the same patient is re-ingested with a corrected data source, the provenance trace reflects the new state after the next pipeline run, not the prior one. Provenance represents the current resolved state, not a history of pipeline runs.

On this page