Glossary
The terms we use.
Plain-English definitions of the vocabulary that shows up across the product, the docs, and the API. Every term is individually linkable — append #slug to jump straight to one.
- Digest
- One Markdown file combining multiple uploaded files into a single ordered document. Each source file becomes a section with a
## file: name.pdfheader so the model can cite back to source. The digest is what you paste into Claude, ChatGPT, Gemini, or Cursor. - Token-aware
- Output is split by the context windows of common LLMs (Claude, GPT, Gemini) so a digest fits without manual trimming. We measure with the same tokenizer family the target model uses and emit segment boundaries at safe natural breaks — section headers, not mid-sentence.
- Docling
- IBM-developed open-source document parser FileDigest uses under the hood. Docling preserves reading order, table structure, headings, and figure references — the things naive PDF text extraction loses. We do not modify Docling output beyond stitching multi-file results into one ordered digest.
- IU72
- FileDigest's cognitive snapshot module: a single self-contained HTML page that lays out up to 72 indivisible units (IUs) across 12 panels, four rows of three. It is meant for visual review of what a document actually says, alongside the linear Markdown digest you paste into your model. See /docs/iu72.
- Modal
- Serverless GPU compute platform FileDigest runs the Docling pipeline on. Modal cold-starts a container per job, scales to zero when idle, and lets us bill compute proportionally to actual work. A typical 4.5MB cold-start run completes in about 62 seconds; warm runs in about 17 seconds.
- Signed-upload URL
- Time-limited Supabase Storage URL issued by our API. The browser uploads files directly to Storage using the URL, bypassing Vercel's 4.5MB request body cap and avoiding a round-trip through our serverless functions. URLs expire after a few minutes and cannot be reused.
- BYOC
- Bring Your Own Compute. Business-tier customers can point FileDigest at their own Modal account so jobs run on their compute and billing. Useful when GPU spend is already budgeted, or when residency/segmentation rules require dedicated infrastructure.
- Markdown
- Plain-text format with structure-preserving syntax (headings with
#, emphasis with**, tables, code blocks). It is the canonical input format for LLM context windows because it's compact, unambiguous, and tokenizes efficiently. - Context window
- The maximum number of tokens an LLM can hold at once across input and output. FileDigest fits to common ones — 200K (Claude), 128K (GPT), and 1M (Gemini) — so the digest you paste is guaranteed not to truncate. Larger source corpora produce multi-segment digests with clean break points.
- OCR
- Optical Character Recognition. When a PDF is image-only — typical of scans, faxes, or older archival material — Docling runs OCR to recover text. OCR is automatic on Pro and Business plans; on Free, image-only files return the structural skeleton without recognized text.
- RAG
- Retrieval-Augmented Generation: a pattern where an LLM is grounded by retrieving relevant documents from an index at query time. FileDigest is a common ingest source — clean section-headed Markdown chunks better than raw PDF text and embeds more reliably into vector stores like pgvector, Pinecone, or Weaviate.
- Webhook
- Server-to-server HTTP callback. Stripe sends one to FileDigest when a checkout completes, a subscription renews, or a payment fails — and we use the event to flip the user's plan tier in the database. Webhooks are signed; we verify every payload before acting on it.
- WCAG
- Web Content Accessibility Guidelines, the W3C standard for accessible web content. FileDigest meets the WCAG 2.1 AA 4.5:1 contrast ratio for normal text in both dark and light themes, plus the 3:1 ratio for large text and UI components. Verified by automated Playwright runs across Chromium, Firefox, and WebKit.
Missing a term?
Email hello@filedigest.app and we'll add it.