|
| 1 | +--- |
| 2 | +sidebar_position: 2 |
| 3 | +title: "Web Loader Engines" |
| 4 | +--- |
| 5 | + |
| 6 | +# Web Loader Engines |
| 7 | + |
| 8 | +After your search engine returns URLs, Open WebUI still needs to fetch the page content. The **Web Loader Engine** controls how that content is retrieved for traditional web search, URL fetching, and features like [Save Search Results to Knowledge](./save-to-knowledge). |
| 9 | + |
| 10 | +You can configure the loader in **Admin Panel → Settings → Web Search → Loader** or with [`WEB_LOADER_ENGINE`](/reference/env-configuration#web_loader_engine). |
| 11 | + |
| 12 | +## Which loader should you use? |
| 13 | + |
| 14 | +| Loader | Best for | JavaScript support | Extra setup | Speed / cost profile | |
| 15 | +| --- | --- | --- | --- | --- | |
| 16 | +| `safe_web` | Static docs, blogs, and simple HTML pages | No browser rendering | None | Fastest and lightest | |
| 17 | +| `playwright` | Single-page apps and JavaScript-heavy sites | Yes | Browser install or remote Playwright endpoint | Slower and heavier than `safe_web` | |
| 18 | +| `firecrawl` | Cleaner extracted content from difficult or noisy pages | Usually yes, via Firecrawl service | Firecrawl API key and service access | External service, may add cost and network dependency | |
| 19 | + |
| 20 | +## `safe_web` |
| 21 | + |
| 22 | +`safe_web` is the default loader. It fetches the raw page HTML directly, retries failed requests, and extracts plain text from the page. |
| 23 | + |
| 24 | +Use it when: |
| 25 | + |
| 26 | +- You want the simplest setup with no external service. |
| 27 | +- The target site already renders most content in the initial HTML. |
| 28 | +- You care about speed and low overhead. |
| 29 | + |
| 30 | +Tradeoffs: |
| 31 | + |
| 32 | +- It does not run page JavaScript, so SPAs and client-rendered sites may return incomplete or empty content. |
| 33 | +- Extraction is based on the page HTML, so the result can be noisier than a dedicated extraction service. |
| 34 | + |
| 35 | +Useful settings: |
| 36 | + |
| 37 | +- [`WEB_LOADER_TIMEOUT`](/reference/env-configuration#web_loader_timeout) to prevent slow pages from hanging too long. |
| 38 | +- [`WEB_SEARCH_TRUST_ENV`](/reference/env-configuration#web_search_trust_env) if Open WebUI must honor `http_proxy` or `https_proxy`. |
| 39 | + |
| 40 | +## `playwright` |
| 41 | + |
| 42 | +`playwright` opens the page in a real browser, waits for it to render, and then extracts the content. This makes it the best built-in choice for modern web apps that depend on JavaScript. |
| 43 | + |
| 44 | +Use it when: |
| 45 | + |
| 46 | +- `safe_web` returns partial content, placeholders, or empty pages. |
| 47 | +- The site requires client-side rendering before the content exists. |
| 48 | +- You need browser-like fetching without relying on an external extraction API. |
| 49 | + |
| 50 | +Tradeoffs: |
| 51 | + |
| 52 | +- It is slower and uses more CPU and memory than `safe_web`. |
| 53 | +- If you do not provide a remote browser, Open WebUI installs Chromium dependencies on startup. |
| 54 | +- Browser navigation timeouts matter more here than with the default loader. |
| 55 | + |
| 56 | +Useful settings: |
| 57 | + |
| 58 | +- [`PLAYWRIGHT_WS_URL`](/reference/env-configuration#playwright_ws_url) to connect to a remote Playwright browser. |
| 59 | +- [`PLAYWRIGHT_TIMEOUT`](/reference/env-configuration#playwright_timeout) to control how long page navigation can take. |
| 60 | + |
| 61 | +## `firecrawl` |
| 62 | + |
| 63 | +`firecrawl` sends the URL list to a Firecrawl service, which scrapes the pages and returns extracted markdown back to Open WebUI. |
| 64 | + |
| 65 | +Use it when: |
| 66 | + |
| 67 | +- You want cleaner extracted content than plain HTML-to-text conversion. |
| 68 | +- You are scraping pages that are difficult, noisy, or inconsistent with the default loader. |
| 69 | +- You are comfortable depending on an external service for extraction. |
| 70 | + |
| 71 | +Tradeoffs: |
| 72 | + |
| 73 | +- It requires Firecrawl connectivity and usually an API key. |
| 74 | +- Availability, latency, and cost depend on the Firecrawl service you use. |
| 75 | +- Because extraction happens outside Open WebUI, it adds an external network dependency. |
| 76 | + |
| 77 | +Useful settings: |
| 78 | + |
| 79 | +- [`FIRECRAWL_API_BASE_URL`](/reference/env-configuration#firecrawl_api_base_url) |
| 80 | +- [`FIRECRAWL_API_KEY`](/reference/env-configuration#firecrawl_api_key) |
| 81 | +- [`FIRECRAWL_TIMEOUT`](/reference/env-configuration#firecrawl_timeout) |
| 82 | + |
| 83 | +## Quick recommendations |
| 84 | + |
| 85 | +- Start with `safe_web` for general-purpose web search. |
| 86 | +- Switch to `playwright` when pages depend on JavaScript rendering. |
| 87 | +- Switch to `firecrawl` when you want cleaner extraction and do not mind using an external service. |
| 88 | + |
| 89 | +## Troubleshooting loader choice |
| 90 | + |
| 91 | +If web search quality is poor: |
| 92 | + |
| 93 | +- Empty or incomplete pages usually mean the site needs `playwright` or `firecrawl`. |
| 94 | +- Slow or hanging fetches with `safe_web` usually mean you should set `WEB_LOADER_TIMEOUT`. |
| 95 | +- Proxy-based deployments should enable `WEB_SEARCH_TRUST_ENV`. |
| 96 | + |
| 97 | +For broader debugging steps, see the [Web Search Troubleshooting Guide](/troubleshooting/web-search). |
0 commit comments