LLM Prep
Just the facts

Frequently Asked Questions

Everything you need to know about getting your data RAG-ready.

Conversions

What is LLM Prep?
LLM Prep is a document processing platform that converts PDFs, Word docs, and other files into clean, structured Markdown and JSON. We optimize your data for RAG (Retrieval-Augmented Generation) so your AI applications can retrieve accurate context without hallucinations.
What tools are used to convert documents?
We use a Vision First approach with Docling to convert documents. Docling is a powerful AI-powered document conversion tool created by IBM that uses computer vision to "see" your document like a human would, rather than just extracting text. This Vision First approach intelligently analyzes the visual layout, identifies document elements (headers, tables, images, etc.), and converts them into structured formats. It uses advanced AI to analyze images in your documents and convert them into detailed, descriptive text. This makes the visual content searchable and accessible to RAG systems.
What happens to images during conversion?

All images are analyzed and described to ensure complete context for your LLM. Our AI uses Google Gemini Vision to generate detailed descriptions of every visual element in your document.

  • Diagrams & Charts: Structure, labels, relationships, and data points are captured in detail.
  • Photos & Portraits: Subject matter, visual details, and contextual information are described.
  • Logos & Branding: Visual elements are preserved because they may be essential context (e.g., brand guidelines documents).

We describe everything because context matters—a mint leaf photo might be critical in an essential oils guide, and a logo description could be essential in a branding document.

What does LLM Prep remove from a document?

LLM Prep automatically removes content that adds noise to your knowledge base and confuses LLMs during retrieval. We strip out:

  • Table of Contents: TOC pages are navigation aids for humans reading PDFs—your LLM doesn't need them since the actual headings exist in the document body.
  • Page Numbers: References like "Page 1 of 10" or standalone numbers fragment awkwardly across chunks and add no semantic value.
  • Headers & Footers: Repetitive elements like copyright notices, document titles in margins, and legal disclaimers that appear on every page.

This cleaning happens automatically during conversion, so your markdown is optimized for vector databases and RAG retrieval right out of the box.

How does LLM Prep convert documents?

When LLM Prep converts a complex document, it performs the following sophisticated tasks:

  • Layout Analysis: It segments the page into semantic regions: Title, Section Header, Paragraph, Table, Figure, Footer, etc.
  • Reading Order Inference: In multi-column PDFs, it calculates the correct human reading path so sentences don't get jumbled.
  • Table Structure Recovery: It converts visual tables into dataframes or Markdown tables accurately.
  • Metadata Extraction: It extracts document metadata (authors, titles, references) and tracks provenance.
  • Unified Representation: It standardizes everything across different source file types.
How does LLM Prep convert PDFs?

PDFs are difficult because they are essentially 'digital paper'. We use computer vision AI models (specifically RT-DETR trained on a dataset called DocLayNet) to 'look' at the page and identify elements.

  • Visual Analysis: Drawing bounding boxes around elements using AI models.
  • Table Reconstruction: Specialized models like TableFormer align grid lines for precise Reconstruction.
How does LLM Prep handle tables?
Tables are notoriously difficult for LLMs. LLM Prep preserves table structure by converting them into Markdown tables or structured JSON objects, ensuring that row/column relationships are maintained.
What is the 'RAG Readiness' score?
Every document you process gets a score based on how well it can be understood by an AI. We check for issues like broken text, poor formatting, or missing context, so you know if your data is ready for production.
Why does processing a file take so long?
Processing time varies based on complexity (AI image descriptions, table conversion, structuring pages) and the fact that we generate two formats (Markdown and JSON) simultaneously.
How do I view my conversion history?
You can view it in the 'Conversion History' tab in the left sidebar. It shows the input file, output format, and time. Expired files will only show the record, not the file itself.
Why did my conversion fail?

Conversions can fail for several reasons:

  • Server Disruption: Temporary server issues or maintenance.
  • Corrupted or Invalid File: Damaged or incomplete files.
  • Unsupported Format: Specialized or heavily encrypted files.
  • File Size Issues: Large or complex files may timeout.
  • Password Protection: Encrypted files cannot be processed.

If your conversion fails, your credits are automatically returned to your account.

Chunking

What is chunking?
Chunking splits your document into optimal-sized pieces for AI, preserving context and adding headers to make it easier for AI to retrieve relevant information.
What method does LLM Prep use for chunking?
We use Recursive Character Text Splitting as it is the most performant and versatile method for most use cases.
What is Recursive Character Text Splitting?
Yes! We offer a free tier with daily credits so you can test our quality. No credit card required. If you need more volume, check out our Pricing page or contact us at contact@llmprep.com for enterprise needs.
Why can't I chunk without converting a file first?
Conversion creates the clean, structured foundation (removing noise like page numbers/headers) that chunking needs to work effectively.

Credits & Plans

What is a credit?
One credit equals processing 100 pages for one action (either converting or chunking).
Do credits expire?
No. Paid credits never expire. Daily free credits reset at 12AM UTC.
Why is converting and chunking each a credit?
They are separate, resource-intensive processes requiring significant computational power to ensure high accuracy and structure.
Can I buy multiple credit packs?
Yes, credits are stackable and can be purchased on the Pricing page.
Why is there a file size limit?
Limits (30MB for free tier) ensure optimal processing performance and balance resource efficiency for all users.
What happens if a conversion fails?
Credits are automatically re-added to your balance if the process is not completed successfully.
What happens if I cancel a conversion?
Credits are automatically re-added to your balance.
Can I get a refund?
Refunds are generally not offered for package purchases unless there was a purchase issue; failed conversions result in credit returns.
Do you have an Enterprise plan?
Reach out to contact@llmprep.com for enterprise inquiries regarding higher limits or custom implementations.

Storage & Security

Why do files expire?
For privacy and security. We do not permanently store documents; they are purged automatically after processing to protect your data.
Why can't I chunk after a file expires?
Once purged, the file is permanently removed. You would need to re-upload and convert to chunk it again.
What happens if I forgot to download my files?
You will need to use credits to convert again, as we do not store files due to privacy regulations.
What security do we use and why?
Enterprise-grade encryption in transit and at rest, secure isolated processing environments, and strict data removal protocols to ensure your sensitive business data remains private.

Still have questions?

Can't find the answers you're looking for? You can reach out to our friendly team!