Glossary

The terms we use.

Plain-English definitions of the vocabulary that shows up across the product, the docs, and the API. Every term is individually linkable — append #slug to jump straight to one.

Digest: One Markdown file combining multiple uploaded files into a single ordered document. Each source file becomes a section with a ## file: name.pdf header so the model can cite back to source. The digest is what you paste into Claude, ChatGPT, Gemini, or Cursor.
Token-aware: Output is split by the context windows of common LLMs (Claude, GPT, Gemini) so a digest fits without manual trimming. We measure with the same tokenizer family the target model uses and emit segment boundaries at safe natural breaks — section headers, not mid-sentence.
Docling: IBM-developed open-source document parser FileDigest uses under the hood. Docling preserves reading order, table structure, headings, and figure references — the things naive PDF text extraction loses. We do not modify Docling output beyond stitching multi-file results into one ordered digest.
IU72: FileDigest's cognitive snapshot module: a single self-contained HTML page that lays out up to 72 indivisible units (IUs) across 12 panels, four rows of three. It is meant for visual review of what a document actually says, alongside the linear Markdown digest you paste into your model. See /docs/iu72.
Modal: Serverless GPU compute platform FileDigest runs the Docling pipeline on. Modal cold-starts a container per job, scales to zero when idle, and lets us bill compute proportionally to actual work. A typical 4.5MB cold-start run completes in about 62 seconds; warm runs in about 17 seconds.
Signed-upload URL: Time-limited Supabase Storage URL issued by our API. The browser uploads files directly to Storage using the URL, bypassing Vercel's 4.5MB request body cap and avoiding a round-trip through our serverless functions. URLs expire after a few minutes and cannot be reused.
BYOC: Bring Your Own Compute. Business-tier customers can point FileDigest at their own Modal account so jobs run on their compute and billing. Useful when GPU spend is already budgeted, or when residency/segmentation rules require dedicated infrastructure.
Markdown: Plain-text format with structure-preserving syntax (headings with #, emphasis with **, tables, code blocks). It is the canonical input format for LLM context windows because it's compact, unambiguous, and tokenizes efficiently.
Context window: The maximum number of tokens an LLM can hold at once across input and output. FileDigest fits to common ones — 200K (Claude), 128K (GPT), and 1M (Gemini) — so the digest you paste is guaranteed not to truncate. Larger source corpora produce multi-segment digests with clean break points.
OCR: Optical Character Recognition. When a PDF is image-only — typical of scans, faxes, or older archival material — Docling runs OCR to recover text. OCR is automatic on Pro and Business plans; on Free, image-only files return the structural skeleton without recognized text.
RAG: Retrieval-Augmented Generation: a pattern where an LLM is grounded by retrieving relevant documents from an index at query time. FileDigest is a common ingest source — clean section-headed Markdown chunks better than raw PDF text and embeds more reliably into vector stores like pgvector, Pinecone, or Weaviate.
Webhook: Server-to-server HTTP callback. Stripe sends one to FileDigest when a checkout completes, a subscription renews, or a payment fails — and we use the event to flip the user's plan tier in the database. Webhooks are signed; we verify every payload before acting on it.
WCAG: Web Content Accessibility Guidelines, the W3C standard for accessible web content. FileDigest meets the WCAG 2.1 AA 4.5:1 contrast ratio for normal text in both dark and light themes, plus the 3:1 ratio for large text and UI components. Verified by automated Playwright runs across Chromium, Firefox, and WebKit.

Missing a term?

Email hello@filedigest.app and we'll add it.