Skip to content

Commit 3a3d9c8

Browse files
committed
docs(web-search): add loader engine guidance
1 parent 8122eaa commit 3a3d9c8

3 files changed

Lines changed: 99 additions & 3 deletions

File tree

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
---
2+
sidebar_position: 2
3+
title: "Web Loader Engines"
4+
---
5+
6+
# Web Loader Engines
7+
8+
After your search engine returns URLs, Open WebUI still needs to fetch the page content. The **Web Loader Engine** controls how that content is retrieved for traditional web search, URL fetching, and features like [Save Search Results to Knowledge](./save-to-knowledge).
9+
10+
You can configure the loader in **Admin Panel → Settings → Web Search → Loader** or with [`WEB_LOADER_ENGINE`](/reference/env-configuration#web_loader_engine).
11+
12+
## Which loader should you use?
13+
14+
| Loader | Best for | JavaScript support | Extra setup | Speed / cost profile |
15+
| --- | --- | --- | --- | --- |
16+
| `safe_web` | Static docs, blogs, and simple HTML pages | No browser rendering | None | Fastest and lightest |
17+
| `playwright` | Single-page apps and JavaScript-heavy sites | Yes | Browser install or remote Playwright endpoint | Slower and heavier than `safe_web` |
18+
| `firecrawl` | Cleaner extracted content from difficult or noisy pages | Usually yes, via Firecrawl service | Firecrawl API key and service access | External service, may add cost and network dependency |
19+
20+
## `safe_web`
21+
22+
`safe_web` is the default loader. It fetches the raw page HTML directly, retries failed requests, and extracts plain text from the page.
23+
24+
Use it when:
25+
26+
- You want the simplest setup with no external service.
27+
- The target site already renders most content in the initial HTML.
28+
- You care about speed and low overhead.
29+
30+
Tradeoffs:
31+
32+
- It does not run page JavaScript, so SPAs and client-rendered sites may return incomplete or empty content.
33+
- Extraction is based on the page HTML, so the result can be noisier than a dedicated extraction service.
34+
35+
Useful settings:
36+
37+
- [`WEB_LOADER_TIMEOUT`](/reference/env-configuration#web_loader_timeout) to prevent slow pages from hanging too long.
38+
- [`WEB_SEARCH_TRUST_ENV`](/reference/env-configuration#web_search_trust_env) if Open WebUI must honor `http_proxy` or `https_proxy`.
39+
40+
## `playwright`
41+
42+
`playwright` opens the page in a real browser, waits for it to render, and then extracts the content. This makes it the best built-in choice for modern web apps that depend on JavaScript.
43+
44+
Use it when:
45+
46+
- `safe_web` returns partial content, placeholders, or empty pages.
47+
- The site requires client-side rendering before the content exists.
48+
- You need browser-like fetching without relying on an external extraction API.
49+
50+
Tradeoffs:
51+
52+
- It is slower and uses more CPU and memory than `safe_web`.
53+
- If you do not provide a remote browser, Open WebUI installs Chromium dependencies on startup.
54+
- Browser navigation timeouts matter more here than with the default loader.
55+
56+
Useful settings:
57+
58+
- [`PLAYWRIGHT_WS_URL`](/reference/env-configuration#playwright_ws_url) to connect to a remote Playwright browser.
59+
- [`PLAYWRIGHT_TIMEOUT`](/reference/env-configuration#playwright_timeout) to control how long page navigation can take.
60+
61+
## `firecrawl`
62+
63+
`firecrawl` sends the URL list to a Firecrawl service, which scrapes the pages and returns extracted markdown back to Open WebUI.
64+
65+
Use it when:
66+
67+
- You want cleaner extracted content than plain HTML-to-text conversion.
68+
- You are scraping pages that are difficult, noisy, or inconsistent with the default loader.
69+
- You are comfortable depending on an external service for extraction.
70+
71+
Tradeoffs:
72+
73+
- It requires Firecrawl connectivity and usually an API key.
74+
- Availability, latency, and cost depend on the Firecrawl service you use.
75+
- Because extraction happens outside Open WebUI, it adds an external network dependency.
76+
77+
Useful settings:
78+
79+
- [`FIRECRAWL_API_BASE_URL`](/reference/env-configuration#firecrawl_api_base_url)
80+
- [`FIRECRAWL_API_KEY`](/reference/env-configuration#firecrawl_api_key)
81+
- [`FIRECRAWL_TIMEOUT`](/reference/env-configuration#firecrawl_timeout)
82+
83+
## Quick recommendations
84+
85+
- Start with `safe_web` for general-purpose web search.
86+
- Switch to `playwright` when pages depend on JavaScript rendering.
87+
- Switch to `firecrawl` when you want cleaner extraction and do not mind using an external service.
88+
89+
## Troubleshooting loader choice
90+
91+
If web search quality is poor:
92+
93+
- Empty or incomplete pages usually mean the site needs `playwright` or `firecrawl`.
94+
- Slow or hanging fetches with `safe_web` usually mean you should set `WEB_LOADER_TIMEOUT`.
95+
- Proxy-based deployments should enable `WEB_SEARCH_TRUST_ENV`.
96+
97+
For broader debugging steps, see the [Web Search Troubleshooting Guide](/troubleshooting/web-search).

docs/features/chat-conversations/web-search/save-to-knowledge.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,5 +60,5 @@ Set your **Default Knowledge Base** and enable **Skip Confirmation** in your Use
6060

6161
## Troubleshooting
6262

63-
- **Content Quality**: The quality of the saved content depends on your **Web Loader Engine** settings (Admin > Settings > Documents). For JavaScript-heavy sites, consider using **Firecrawl** or **Playwright**.
63+
- **Content Quality**: The quality of the saved content depends on your **Web Loader Engine** settings (Admin > Settings > Web Search). For JavaScript-heavy sites, consider using **Firecrawl** or **Playwright**. See [Web Loader Engines](./loaders) for guidance on when to use each option.
6464
- **No URLs Found**: This action works with web search results that return structured citations. If no URLs are detected, ensure web search is properly enabled and returning results.

docs/troubleshooting/web-search.mdx

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ If web search returns empty content or poor quality results, the issue is often
6868
6969
- **Check result count**: Adjust `WEB_SEARCH_RESULT_COUNT` to control how many results are fetched.
7070

71-
- **Try different loaders**: Configure `WEB_LOADER_ENGINE` to use `playwright` for JavaScript-heavy sites or `firecrawl`/`tavily` for better extraction.
71+
- **Try different loaders**: Configure `WEB_LOADER_ENGINE` to use `playwright` for JavaScript-heavy sites or `firecrawl`/`tavily` for better extraction. See [Web Loader Engines](/features/chat-conversations/web-search/loaders) for a side-by-side comparison.
7272

7373
For more details on context window issues, see the [RAG Troubleshooting Guide](./rag).
7474

@@ -103,4 +103,3 @@ Key variables:
103103
| `WEB_LOADER_ENGINE` | Content extraction engine |
104104

105105
---
106-

0 commit comments

Comments
 (0)