Clinia
How-to Guides

Audit Entity Resolution

Read the resolution report to understand how entities from different sources were merged, what conflicts were found, and which resolver made each decision.

Where to start

The resolution report records every merge decision made by the 4-layer pipeline for a patient. Use it to understand why two records were (or were not) merged, what conflicts were found between sources, and how each conflict was resolved.

Fetch the report

curl "https://api.<workspace-id>.clinia.cloud/patients/{patientId}/resolution" \
  -H "X-Clinia-API-Key: <clinia-api-key>"

Report structure

{
  "summary": {
    "totalEntities": 18,
    "totalEvents": 11,
    "mergedEntities": 3,
    "warningCount": 1,
    "loadMs": 840
  },
  "merged": [ ... ],
  "warnings": [ ... ]
}
  • summary.totalEntities: total number of entities found across all sources for the patient.
  • summary.totalEvents: total number of merge or resolution events processed in the pipeline.
  • summary.mergedEntities: number of entity groups where two or more source records were deduplicated into one. If this is 0, every entity from every source was treated as unique.
  • summary.warningCount: number of warnings generated during the resolution process (see the warnings section for details).
  • summary.loadMs: time taken (in milliseconds) to generate the resolution
  • merged: one entry per resolved entity, with its full provenance trace.
  • warnings: entities or events that could not be processed cleanly.

Reading a merged entity

{
  "id": "condition:snomed:44054006",
  "type": "condition",
  "display": "Type 2 Diabetes Mellitus",
  "resolvedBy": "deterministic-code",
  "confidence": 0.9,
  "reasoning": "Both entities share SNOMED code 44054006.",
  "sources": [
    { "origin": "FHIR Bundle", "type": "fhir", "reliability": 0.85, "ref": "Condition/abc" },
    {
      "origin": "Summary_20230907.xml",
      "type": "cda",
      "reliability": 0.8,
      "ref": "observation/456"
    }
  ],
  "conflicts": []
}

resolvedBy values

ValueLayerWhat happened
deterministic-code1Entities share a standard code (SNOMED, RxNorm, ICD-10, LOINC)
nlp-normalization2Same concept with different display text or minor variation
embedding-similarity3Semantically similar with no shared code
llm-adjudication4Ambiguous case escalated to the LLM for clinical judgment
no-matchnoneEntity was not merged with any other. Treated as unique

Layer 4 fires for at most ~15 pairs per patient. Higher layers are more expensive; lower layer numbers are cheaper and faster.

Reading conflicts

When sources disagree on an attribute value, both values are recorded along with the resolution strategy:

{
  "conflicts": [
    {
      "field": "status",
      "values": [
        { "value": "active", "source": "FHIR Bundle" },
        { "value": "resolved", "source": "Summary_20230907.xml" }
      ],
      "winner": "active",
      "strategy": "recency",
      "explanation": "FHIR Bundle is more recent (2023-09-07 vs 2021-04-12)."
    }
  ]
}

Conflict resolution strategies

StrategyWhen used
corroborationMajority of sources agree on one value
recencyPrefer the most recently dated source
reliabilityWeight sources by their reliability score
llm-judgmentComplex conflict requiring clinical evaluation

Reading warnings

{
  "warnings": [
    {
      "source": "cda",
      "id": "observation/789",
      "message": "No recognised code system. Could not match by code.",
      "severity": "low"
    }
  ]
}

Warnings are non-fatal. The entity is still ingested, but it could not be matched by the deterministic layer and may have been handled by a more expensive layer or left unmerged.

Debugging a non-merge

If you expected two records to merge but they didn't, look for the entity in merged with resolvedBy: "no-match". Check:

  1. Do the records share a standard code? If not, deterministic matching (layer 1) cannot fire.
  2. Are the display names similar? Layer 2 uses NLP normalization, very different display text may not match.
  3. What was the embedding similarity score? Layer 3 fires only above the escalation threshold.
  4. Was it escalated to LLM? If max_llm_calls_per_patient was reached, remaining ambiguous pairs are left unmerged rather than burning more LLM budget.

See Entity Resolution for a full explanation of the 4-layer cascade.

On this page