Skip to main content

/ocr

FeatureSupported
Cost Tracking✅
Logging✅ (Basic Logging not supported)
Load Balancing✅
Supported Providersmistral, azure_ai, vertex_ai

LiteLLM Python SDK Usage​

Quick Start​

from litellm import ocr
import os

os.environ["MISTRAL_API_KEY"] = "sk-.."

response = ocr(
model="mistral/mistral-ocr-latest",
document={
"type": "document_url",
"document_url": "https://arxiv.org/pdf/2201.04234"
}
)

# Access extracted text
for page in response.pages:
print(f"Page {page.index}:")
print(page.markdown)

Async Usage​

from litellm import aocr
import os, asyncio

os.environ["MISTRAL_API_KEY"] = "sk-.."

async def test_async_ocr():
response = await aocr(
model="mistral/mistral-ocr-latest",
document={
"type": "document_url",
"document_url": "https://arxiv.org/pdf/2201.04234"
}
)

# Access extracted text
for page in response.pages:
print(f"Page {page.index}:")
print(page.markdown)

asyncio.run(test_async_ocr())

Using Local Files​

LiteLLM can read local files directly — no manual base64 encoding needed:

from litellm import ocr

# OCR with a local PDF file path
response = ocr(
model="mistral/mistral-ocr-latest",
document={
"type": "file",
"file": "/path/to/document.pdf"
}
)

# OCR with a file object
response = ocr(
model="mistral/mistral-ocr-latest",
document={
"type": "file",
"file": open("document.pdf", "rb")
}
)

# OCR with raw bytes
with open("document.pdf", "rb") as f:
pdf_bytes = f.read()

response = ocr(
model="mistral/mistral-ocr-latest",
document={
"type": "file",
"file": pdf_bytes,
"mime_type": "application/pdf" # recommended for raw bytes (auto-detected from extension for file paths)
}
)

The file field accepts:

  • File path (str or pathlib.Path) — LiteLLM reads the file and detects the MIME type from the extension
  • File object (binary file-like object) — e.g. open("doc.pdf", "rb")
  • Raw bytes (bytes) — use mime_type to specify the content type

LiteLLM automatically converts file inputs to base64 data URIs internally, so all providers work seamlessly.

Using Base64 Encoded Documents​

import base64
from litellm import ocr

# Encode PDF to base64
with open("document.pdf", "rb") as f:
base64_pdf = base64.b64encode(f.read()).decode('utf-8')

response = ocr(
model="mistral/mistral-ocr-latest",
document={
"type": "document_url",
"document_url": f"data:application/pdf;base64,{base64_pdf}"
}
)

Optional Parameters​

response = ocr(
model="mistral/mistral-ocr-latest",
document={
"type": "document_url",
"document_url": "https://example.com/doc.pdf"
},
# Optional Mistral parameters
pages=[0, 1, 2], # Only process specific pages
include_image_base64=True, # Include extracted images
image_limit=10, # Max images to return
image_min_size=100 # Min image size to include
)

LiteLLM Proxy Usage​

LiteLLM provides a Mistral API compatible /ocr endpoint for OCR calls.

Setup

Add this to your litellm proxy config.yaml

model_list:
- model_name: mistral-ocr
litellm_params:
model: mistral/mistral-ocr-latest
api_key: os.environ/MISTRAL_API_KEY

Start litellm

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

Test request — JSON body

curl http://0.0.0.0:4000/v1/ocr \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "mistral-ocr",
"document": {
"type": "document_url",
"document_url": "https://arxiv.org/pdf/2201.04234"
}
}'

Test request — multipart file upload

Upload a file directly using multipart form data. No need to base64-encode the file yourself.

curl http://0.0.0.0:4000/v1/ocr \
-H "Authorization: Bearer sk-1234" \
-F "model=mistral-ocr" \
-F "file=@/path/to/document.pdf"

You can also pass optional parameters as additional form fields:

curl http://0.0.0.0:4000/v1/ocr \
-H "Authorization: Bearer sk-1234" \
-F "model=mistral-ocr" \
-F "file=@screenshot.png" \
-F 'pages=[0,1,2]' \
-F "include_image_base64=true"

Request/Response Format​

info

LiteLLM follows the Mistral OCR API specification.

See the official Mistral OCR documentation for complete details.

Example Request​

{
"model": "mistral/mistral-ocr-latest",
"document": {
"type": "document_url",
"document_url": "https://arxiv.org/pdf/2201.04234"
},
"pages": [0, 1, 2], # Optional: specific pages to process
"include_image_base64": True, # Optional: include extracted images
"image_limit": 10, # Optional: max images to return
"image_min_size": 100 # Optional: min image size in pixels
}

Request Parameters​

ParameterTypeRequiredDescription
modelstringYesThe OCR model to use (e.g., "mistral/mistral-ocr-latest")
documentobjectYesDocument to process. Must contain type and the corresponding field
document.typestringYes"document_url" for PDFs/docs, "image_url" for images, or "file" for local files
document.document_urlstringConditionalURL or data URI to the document (required if type is "document_url")
document.image_urlstringConditionalURL or data URI to the image (required if type is "image_url")
document.filestring/bytes/fileConditionalFile path, bytes, or file-like object (required if type is "file")
document.mime_typestringNoExplicit MIME type for file inputs (auto-detected from extension if not provided)
pagesarrayNoList of specific page indices to process (0-indexed)
include_image_base64booleanNoWhether to include extracted images as base64 strings
image_limitintegerNoMaximum number of images to return
image_min_sizeintegerNoMinimum size (in pixels) for images to include

Document Format Examples​

For PDFs and documents (URL):

{
"type": "document_url",
"document_url": "https://example.com/document.pdf"
}

For images (URL):

{
"type": "image_url",
"image_url": "https://example.com/image.png"
}

For base64-encoded content:

{
"type": "document_url",
"document_url": "data:application/pdf;base64,JVBERi0xLjQKJ..."
}

For local files (SDK):

{"type": "file", "file": "/path/to/document.pdf"}
{"type": "file", "file": open("image.png", "rb")}
{"type": "file", "file": pdf_bytes, "mime_type": "application/pdf"}

For file uploads (Proxy — multipart form):

curl http://0.0.0.0:4000/v1/ocr \
-H "Authorization: Bearer sk-1234" \
-F "model=mistral-ocr" \
-F "file=@document.pdf"

Response Format​

The response follows Mistral's OCR format with the following structure:

{
"pages": [
{
"index": 0,
"markdown": "# Document Title\n\nExtracted text content...",
"dimensions": {
"dpi": 200,
"height": 2200,
"width": 1700
},
"images": [
{
"image_base64": "base64string...",
"bbox": {
"x": 100,
"y": 200,
"width": 300,
"height": 400
}
}
]
}
],
"model": "mistral-ocr-2505-completion",
"usage_info": {
"pages_processed": 29,
"doc_size_bytes": 3002783
},
"document_annotation": null,
"object": "ocr"
}

Response Fields​

FieldTypeDescription
pagesarrayList of processed pages with extracted content
pages[].indexintegerPage number (0-indexed)
pages[].markdownstringExtracted text in Markdown format
pages[].dimensionsobjectPage dimensions (dpi, height, width in pixels)
pages[].imagesarrayExtracted images from the page (if include_image_base64=true)
modelstringThe model used for OCR processing
usage_infoobjectProcessing statistics (pages processed, document size)
document_annotationobjectOptional document-level annotations
objectstringAlways "ocr" for OCR responses

Supported Providers​

ProviderLink to Usage
Mistral AIUsage
Azure AIUsage
Vertex AIUsage