Frequently Asked Questions
Everything you need to know about getting your data RAG-ready.
Conversions
What is LLM Prep?
What tools are used to convert documents?
What happens to images during conversion?
All images are analyzed and described to ensure complete context for your LLM. Our AI uses Google Gemini Vision to generate detailed descriptions of every visual element in your document.
- Diagrams & Charts: Structure, labels, relationships, and data points are captured in detail.
- Photos & Portraits: Subject matter, visual details, and contextual information are described.
- Logos & Branding: Visual elements are preserved because they may be essential context (e.g., brand guidelines documents).
We describe everything because context matters—a mint leaf photo might be critical in an essential oils guide, and a logo description could be essential in a branding document.
What does LLM Prep remove from a document?
LLM Prep automatically removes content that adds noise to your knowledge base and confuses LLMs during retrieval. We strip out:
- Table of Contents: TOC pages are navigation aids for humans reading PDFs—your LLM doesn't need them since the actual headings exist in the document body.
- Page Numbers: References like "Page 1 of 10" or standalone numbers fragment awkwardly across chunks and add no semantic value.
- Headers & Footers: Repetitive elements like copyright notices, document titles in margins, and legal disclaimers that appear on every page.
This cleaning happens automatically during conversion, so your markdown is optimized for vector databases and RAG retrieval right out of the box.
How does LLM Prep convert documents?
When LLM Prep converts a complex document, it performs the following sophisticated tasks:
- Layout Analysis: It segments the page into semantic regions: Title, Section Header, Paragraph, Table, Figure, Footer, etc.
- Reading Order Inference: In multi-column PDFs, it calculates the correct human reading path so sentences don't get jumbled.
- Table Structure Recovery: It converts visual tables into dataframes or Markdown tables accurately.
- Metadata Extraction: It extracts document metadata (authors, titles, references) and tracks provenance.
- Unified Representation: It standardizes everything across different source file types.
How does LLM Prep convert PDFs?
PDFs are difficult because they are essentially 'digital paper'. We use computer vision AI models (specifically RT-DETR trained on a dataset called DocLayNet) to 'look' at the page and identify elements.
- Visual Analysis: Drawing bounding boxes around elements using AI models.
- Table Reconstruction: Specialized models like TableFormer align grid lines for precise Reconstruction.
How does LLM Prep handle tables?
What is the 'RAG Readiness' score?
Why does processing a file take so long?
How do I view my conversion history?
Why did my conversion fail?
Conversions can fail for several reasons:
- Server Disruption: Temporary server issues or maintenance.
- Corrupted or Invalid File: Damaged or incomplete files.
- Unsupported Format: Specialized or heavily encrypted files.
- File Size Issues: Large or complex files may timeout.
- Password Protection: Encrypted files cannot be processed.
If your conversion fails, your credits are automatically returned to your account.
