Does the hybrid mode needs GPU? #180

wyh · 2026-02-04T11:14:22Z

wyh
Feb 4, 2026

how long does it take to process a page in hybrid mode?

does the system automatically decide which page goes to the right mode?

hnc-jglee · 2026-03-17T04:13:28Z

hnc-jglee
Mar 17, 2026
Maintainer

Hi @wyh, great questions! Let me answer each one.

1. Does hybrid mode need a GPU?

No, GPU is not required. The hybrid server runs on CPU by default. If a GPU is available, it will automatically detect and use it for faster processing, but it's entirely optional.

Resource	Requirement
RAM	~2–4 GB (docling models loaded into memory)
Disk	~1–2 GB (model downloads, cached after first run)
GPU	Optional — CPU-only works fine; GPU accelerates OCR and table detection

2. How long does it take to process a page?

It depends on the processing path:

Path	Avg. per document
Java-only (simple pages)	~0.05s
Backend / hybrid (complex pages)	~0.7s

The backend processing averages about 0.685 seconds per document, with a range of 0.19s to 4.24s depending on page complexity.

3. Does the system automatically decide which page goes to which mode?

Yes, fully automatic. The built-in TriageProcessor analyzes each page and routes it to the optimal path:

→ Backend: pages with table borders, vector graphics grids, text-based table patterns, large images (potential charts/tables), or high line-chunk ratios
→ Java: simple text-only pages (the majority of pages)

This means most pages are processed lightning-fast in Java, while only complex pages (tables, charts, etc.) are sent to the backend for deeper analysis.

How to run

Step 1. Install with hybrid support

pip install -U "opendataloader-pdf[hybrid]"

Step 2. Start the backend server (first terminal)

opendataloader-pdf-hybrid --port 5002

Step 3. Process PDFs with hybrid mode (second terminal)

# Basic hybrid mode (auto triage)
opendataloader-pdf --hybrid docling-fast input.pdf

# With custom server URL and timeout
opendataloader-pdf --hybrid docling-fast --hybrid-url http://localhost:5002 --hybrid-timeout 60000 input.pdf

# With fallback to Java on backend error
opendataloader-pdf --hybrid docling-fast --hybrid-fallback input.pdf

# Full mode — send ALL pages to backend (required for enrichments)
opendataloader-pdf --hybrid docling-fast --hybrid-mode full input.pdf

Tip: If you use enrichments like --enrich-formula or --enrich-picture-description, make sure to add --hybrid-mode full. Otherwise, enrichments on Java-processed pages will be silently skipped.

For more details, see the Hybrid Mode documentation.

Hope this helps! Feel free to ask if you have more questions.

1 reply

Tellyang7 Mar 26, 2026

I'm wondering if there's a way to specify where downloaded models are saved, and where to find the download links. Neither is visible right now. just hava:: WARNING - Downloading detection model, please wait. This may take several minutes depending upon your network connection.

wyh · 2026-03-17T04:21:03Z

wyh
Mar 17, 2026
Author

Thanks for your such a detailed reply. I actually tried it on both GPU and CPU, and works like a chram.

Thanks again.

0 replies

hnc-jglee · 2026-04-02T23:21:58Z

hnc-jglee
Apr 2, 2026
Maintainer

Hi @Tellyang7!

Model storage location

By default, models are saved to ~/.cache/docling/models/.

To change this, you have two options:

Option 1 — Environment variable (applies to all Docling operations):

export DOCLING_CACHE_DIR=/path/to/your/cache
opendataloader-pdf-hybrid --port 5002

Option 2 — Pre-download with a custom path using docling-tools:

docling-tools models download -o /path/to/your/models

Pre-downloading models (offline / air-gapped)

You can download all models ahead of time so the hybrid server starts instantly with no network access needed:

# Download the default model set
docling-tools models download

# Download ALL available models (including optional VLM, OCR, etc.)
docling-tools models download --all

# Download to a custom directory
docling-tools models download -o /data/docling-models

What gets downloaded and from where

All models come from Hugging Face Hub (huggingface.co). The default set includes:

Model	Hugging Face Repo	Purpose
Layout detection	`docling-project/docling-layout-heron`	Page layout analysis
Table structure	`docling-project/docling-models`	Table cell/row/column detection
Picture classifier	(bundled in docling-models)	Image vs chart classification
Code/formula	(bundled in docling-models)	LaTeX formula extraction
RapidOCR	(bundled in docling-models)	Default OCR engine

Models are cached after the first download — the warning only appears once.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does the hybrid mode needs GPU? #180

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Does the hybrid mode needs GPU? #180

Uh oh!

wyh Feb 4, 2026

Replies: 3 comments · 1 reply

Uh oh!

hnc-jglee Mar 17, 2026 Maintainer

1. Does hybrid mode need a GPU?

2. How long does it take to process a page?

3. Does the system automatically decide which page goes to which mode?

How to run

Step 1. Install with hybrid support

Step 2. Start the backend server (first terminal)

Step 3. Process PDFs with hybrid mode (second terminal)

Uh oh!

Tellyang7 Mar 26, 2026

Uh oh!

wyh Mar 17, 2026 Author

Uh oh!

hnc-jglee Apr 2, 2026 Maintainer

Model storage location

Pre-downloading models (offline / air-gapped)

What gets downloaded and from where

wyh
Feb 4, 2026

Replies: 3 comments 1 reply

hnc-jglee
Mar 17, 2026
Maintainer

wyh
Mar 17, 2026
Author

hnc-jglee
Apr 2, 2026
Maintainer