Clinia
Concepts

Entity Resolution

How Patient Memory deduplicates clinical entities across multiple sources using a 4-layer escalating pipeline.

What it is

A complex patient record typically has the same condition, medication, or allergy represented multiple times across different sources, with different codes, different display text, and sometimes contradictory details.

Entity resolution is the process of deciding: are these two records the same real-world thing?

The 4-Layer Pipeline

Patient Memory uses an escalating cascade that only moves to more expensive methods when cheaper ones can't decide.

LayerMethodHandles
1Deterministic code matchingEntities sharing standard codes (SNOMED, RxNorm, ICD-10, LOINC)
2NLP normalization + fuzzy matchingSame concept with different display text or minor variation
3Embedding similaritySemantically similar entities with no shared codes
4LLM adjudicationGenuinely ambiguous cases requiring clinical judgment

A pair of entities only escalates to the next layer when the current layer cannot confidently resolve it. Layer 4 fires for at most ~15 pairs per patient to ensure cost-effectiveness.

In practice: ~40% of pairs resolve in Layer 1 for free. ~70% resolve before reaching LLM. The LLM sees only hard cases where clinical judgment is genuinely needed.

Provenance

Every merged entity carries a full audit trail (see Provenance and Auditability for the complete structure):

{
  "sources": [
    { "type": "fhir", "origin": "FHIR Bundle", "reliability": 0.85, "ref": "Condition/abc" },
    {
      "type": "cda",
      "origin": "Summary_20230907.xml",
      "reliability": 0.8,
      "ref": "observation/456"
    }
  ],
  "conflicts": [],
  "resolvedBy": "deterministic-code",
  "reasoning": "Both entities share SNOMED code 44054006.",
  "confidence": 0.9
}

When sources conflict (e.g., different medication doses), the conflict is recorded with both values and the resolution strategy used. Every merge decision is fully auditable.

Conflict Resolution

When sources disagree on an attribute value, the pipeline uses one of these strategies:

StrategyWhen used
CorroborationSelect the value agreed upon by the majority of sources
RecencyPrefer the most recent source
ReliabilityWeight sources by their reliability score
LLM judgmentComplex conflicts where an LLM evaluates both values with full patient context

Post-Processing

After entity deduplication, three passes run:

  1. Family history reclassification: conditions encoded as "hx of X" are moved to the family history subgraph
  2. Status inference: resolved and historical conditions are flagged based on textual cues
  3. Symptom linking: symptoms are linked to likely parent conditions via the clinical knowledge base

Auditing Resolution Decisions

The full resolution report (GET /patients/{patientId}/resolution) returns every entity in the graph with its complete provenance trace. Use this to debug unexpected merge or non-merge outcomes. See Audit Entity Resolution.

For a higher-level reconciliation summary (stats, inferred relationships, post-processing actions), use GET /patients/{patientId}/reconciliation. See Review the Reconciliation Summary.

On this page