Skip to content

Glossary

The terms we use.

Plain-English definitions of the vocabulary that shows up across the product, the docs, and the API. Every term is individually linkable — append #slug to jump straight to one.

Digest
One Markdown file combining multiple uploaded files into a single ordered document. Each source file becomes a section with a ## file: name.pdf header so the model can cite back to source. The digest is what you paste into Claude, ChatGPT, Gemini, or Cursor.
Token-aware
Output is split by the context windows of common LLMs (Claude, GPT, Gemini) so a digest fits without manual trimming. We measure with the same tokenizer family the target model uses and emit segment boundaries at safe natural breaks — section headers, not mid-sentence.
Docling
IBM-developed open-source document parser FileDigest uses under the hood. Docling preserves reading order, table structure, headings, and figure references — the things naive PDF text extraction loses. We do not modify Docling output beyond stitching multi-file results into one ordered digest.
IU72
FileDigest's cognitive snapshot module: a single self-contained HTML page that lays out up to 72 indivisible units (IUs) across 12 panels, four rows of three. It is meant for visual review of what a document actually says, alongside the linear Markdown digest you paste into your model. See /docs/iu72.
Serverless GPU compute platform FileDigest runs the Docling pipeline on. Modal cold-starts a container per job, scales to zero when idle, and lets us bill compute proportionally to actual work. A typical 4.5MB cold-start run completes in about 62 seconds; warm runs in about 17 seconds.
Signed-upload URL
Time-limited Supabase Storage URL issued by our API. The browser uploads files directly to Storage using the URL, bypassing Vercel's 4.5MB request body cap and avoiding a round-trip through our serverless functions. URLs expire after a few minutes and cannot be reused.
BYOC
Bring Your Own Compute. Business-tier customers can point FileDigest at their own Modal account so jobs run on their compute and billing. Useful when GPU spend is already budgeted, or when residency/segmentation rules require dedicated infrastructure.
Markdown
Plain-text format with structure-preserving syntax (headings with #, emphasis with **, tables, code blocks). It is the canonical input format for LLM context windows because it's compact, unambiguous, and tokenizes efficiently.
Context window
The maximum number of tokens an LLM can hold at once across input and output. FileDigest fits to common ones — 200K (Claude), 128K (GPT), and 1M (Gemini) — so the digest you paste is guaranteed not to truncate. Larger source corpora produce multi-segment digests with clean break points.
OCR
Optical Character Recognition. When a PDF is image-only — typical of scans, faxes, or older archival material — Docling runs OCR to recover text. OCR is automatic on Pro and Business plans; on Free, image-only files return the structural skeleton without recognized text.
RAG
Retrieval-Augmented Generation: a pattern where an LLM is grounded by retrieving relevant documents from an index at query time. FileDigest is a common ingest source — clean section-headed Markdown chunks better than raw PDF text and embeds more reliably into vector stores like pgvector, Pinecone, or Weaviate.
Webhook
Server-to-server HTTP callback. Stripe sends one to FileDigest when a checkout completes, a subscription renews, or a payment fails — and we use the event to flip the user's plan tier in the database. Webhooks are signed; we verify every payload before acting on it.
WCAG
Web Content Accessibility Guidelines, the W3C standard for accessible web content. FileDigest meets the WCAG 2.1 AA 4.5:1 contrast ratio for normal text in both dark and light themes, plus the 3:1 ratio for large text and UI components. Verified by automated Playwright runs across Chromium, Firefox, and WebKit.

Missing a term?

Email hello@filedigest.app and we'll add it.

Glossary — FileDigest