How Maya Delivers Your Documents

Chapter 01 · The Patient View

What you actually see.

A typical exchange. You ask a question about your own record. Maya finds the document, answers from it, and shows you the source so you can read it yourself.

You

What did Dr. Chen say about my MRI in September?

Maya

Dr. Chen's note from September 12 describes the MRI as showing stable findings compared to your June scan, with no new areas of concern. She recommended a follow-up in six months. Source · Progress Note · Chen, K. · 2025-09-12

No clinical interpretation. No inference. Navigation back to the document, with the relevant excerpt and a citation you can verify.

If you cannot trace an answer back to a document on your own device, Maya did not give you that answer.

Chapter 02 · What Maya Looks For

Three views of the same document.

Your medical records arrive from each health system as FHIR DocumentReference resources — structured records with metadata (date, author, type, encounter) and the document content itself.

For every document, Maya stores three views — and queries all three in parallel when you ask a question.

01

Structured FHIR

The metadata fields — author, date, document type, encounter linkage — stored in a per-patient SQLite database (fhir_lookup.db). Sub-millisecond lookup.

02

Raw Text Index

The full extracted text of each document, indexed for exact-match search across every word in your entire record history.

03

Semantic Embedding

A 1024-dimensional vector embedding produced by BGE-M3, stored in a Qdrant collection scoped to a single patient. Used to find conceptually related content even when wording differs.

The three views compensate for each other's weaknesses. Structured FHIR is fast but only tells you what fields contain. Raw text catches exact phrases but misses synonyms. Semantic embeddings catch meaning but can drift. Used together, they give Maya the best chance of finding the document you are actually asking about.

Chapter 03 · The Actual Logic

The queries, in full.

What follows is the structure of a real document retrieval. The schemas have been simplified for readability, but the logic is the logic.

Step 1 — Candidate selection by metadata

First, narrow the universe. If you asked about September, we don't load the rest of your history.

SELECT
  doc_id,
  document_date,
  document_type,
  author_name,
  encounter_id
FROM document_references
WHERE patient_id = :patient_id
  AND document_date BETWEEN :start_date AND :end_date
  AND (
    document_type LIKE '%' || :type_filter || '%'
    OR :type_filter IS NULL
  )
ORDER BY document_date DESC;

This query runs against a SQLite file that lives on your own hardware. There is no network call. There is no cloud lookup. The file path itself encodes patient identity — each patient has a separate database, and the application binding to one cannot read another.

Step 2 — Pull raw text for ranking

The candidates from step one get their full text loaded for keyword and semantic matching.

SELECT
  d.doc_id,
  d.document_date,
  d.author_name,
  d.document_type,
  t.raw_text
FROM document_references d
JOIN document_text t ON d.doc_id = t.doc_id
WHERE d.doc_id IN (:candidate_ids);

Step 3 — Semantic ranking against per-patient embeddings

Your question becomes a vector. We compare it against the embeddings stored in your Qdrant collection — and only yours.

# The collection name is bound to patient identity
# at the application layer. There is no shared collection.

hits = qdrant.search(
    collection_name=f"leptonx_memory_{patient_id}_v1",
    query_vector=bge_m3_embed(user_question),
    limit=10,
    score_threshold=0.65,
    with_payload=True
)

The collection name is constructed from patient identity at the application layer. There is no shared collection. There is no path by which one patient's documents could be returned to a query running in another patient's session — and per-patient isolation has been confirmed bidirectionally across every collection in the system.

Step 4 — Reranking and synthesis

A second model rescores the top candidates. The winner is loaded; its text is given to Maya verbatim, with a strict instruction to cite or decline.

reranked = bge_reranker.rank(
    query=user_question,
    documents=[h.payload["text"] for h in hits],
    top_n=3
)

context = build_context(reranked, max_chars=8000)

# Maya is invoked with a system prompt that requires
# every claim to be grounded in `context`. If `context`
# is empty, Maya is required to say so.

Chapter 04 · The Boundaries

What Maya will not do.

Maya's safety boundaries are not promises in marketing copy. They are code. Below are the hard rules that ship with every Maya instance — enforced at the system-prompt layer, with reinforcement at the routing layer and a three-tier hallucination guard above.

Rule 01

Never assert a clinical conclusion that is not stated in the chart.

Rule 02

Never infer progression, recurrence, or metastasis unless explicitly stated in a clinical note.

Rule 03

Never combine documents to make an inference the chart itself does not make.

Rule 04

Always cite the source document by date and author. No citation, no answer.

Rule 05

If no document supports an answer, say so. Do not invent one.

These rules layer on top of patient-specific guards (the oncology deny-list, the imaging workup-language preservation rule, and others) and a hallucination guard that compares Maya's generated answer against the source context before it ever reaches the speaker. If a claim cannot be grounded, the answer is rejected and rewritten — or Maya tells you she cannot answer.

Chapter 05 · Why This Works

A navigator, not a chatbot.

This is not a model trained to sound knowledgeable about your records. It is a structured navigator over your own data, operating entirely on hardware you control.

The architecture rests on three concepts we have built the platform around:

Compiled Patient Knowledge

Every document, every fact, every embedding is compiled and indexed locally per patient. Compilation happens once; retrieval is instantaneous.

Particle Taxonomy®

Compiled knowledge is decomposed into irreducible units — particles — each of which is independently queryable, citable, and inspectable.

First Particles™

The founding design philosophy. Every capability is decomposed to the smallest deterministic unit before any model is invoked.

You do not trust Maya because we said so. You trust Maya because you can read the queries she runs.

How Maya delivers your documents.