| title | Troubleshooting |
|---|---|
| description | This guide provides solutions to common issues encountered when using html2rss. |
import { Code } from "@astrojs/starlight/components";
This guide provides solutions to common issues encountered when using html2rss.
Your browser's developer tools are essential for troubleshooting. Use them to inspect the HTML structure of a webpage and find the correct CSS selectors.
- To open: Right-click an element on a webpage and select "Inspect" or "Inspect Element."
The auto flow is URL-surface sensitive.
- Higher success inputs:
- newsroom/press listing URLs
- category/tag/listing/archive URLs
- changelog/release/update listing URLs
- Lower success inputs:
- generic homepages
- search result pages
- client-rendered app-shell entrypoints
If extraction quality is poor, switch to a more specific listing/update URL before tuning selectors.
If your feed is empty, check the following:
- URL: Ensure the
urlin your configuration is correct and accessible. items.selector: Verify that theitems.selectormatches the elements on the page.- Website Changes: Websites change their HTML structure frequently. Your selectors may be outdated.
- JavaScript Content: If the content is loaded via JavaScript, use a browser-based rendering strategy.
- Authentication: Some sites require authentication — check if you need to add headers or use a different strategy.
auto classifies no-scraper failures with actionable hints:
- Blocked surface likely (anti-bot or interstitial):
- try a more specific public listing URL
- App-shell surface detected:
- target a direct listing/update page instead of homepage/shell entrypoint
- Unsupported extraction surface for auto mode:
- switch to listing/changelog/category URLs
- or use explicit selectors in YAML config
Known anti-bot interstitial patterns (for example Cloudflare challenge pages) are surfaced as blocked-surface errors instead of silent empty extraction results.
When all auto fallback tiers complete but still extract zero items, html2rss raises No RSS feed items extracted after auto fallback ....
If failures continue after URL/surface fixes, retry with an explicit browser-based override (--strategy browserless), or --strategy botasaurus when BOTASAURUS_SCRAPER_URL is configured.
If you receive Browserless connection failed (...):
- Confirm Browserless is running and reachable from the machine running
html2rss. - Confirm
BROWSERLESS_IO_WEBSOCKET_URLpoints at that running service. - Confirm
BROWSERLESS_IO_API_TOKENmatches the BrowserlessTOKEN.
Example local startup:
<Code
code={docker run --rm -p 3000:3000 -e "CONCURRENT=10" -e "TOKEN=6R0W53R135510" ghcr.io/browserless/chromium}
lang="bash"
/>
Then run with:
<Code
code={BROWSERLESS_IO_WEBSOCKET_URL="ws://127.0.0.1:3000" \ BROWSERLESS_IO_API_TOKEN="6R0W53R135510" \ html2rss auto https://example.com/updates --strategy browserless}
lang="bash"
/>
For custom websocket endpoints, BROWSERLESS_IO_API_TOKEN is required.
Common configuration-related errors:
UnsupportedResponseContentType: The website returned content that html2rss can't parse (not HTML or JSON).UnsupportedStrategy: The specified strategy is not available. Useauto,faraday,browserless, orbotasaurus.BOTASAURUS_SCRAPER_URL is required for strategy=botasaurus.: SetBOTASAURUS_SCRAPER_URLto your Botasaurus scrape API base URL when using--strategy botasaurus.BOTASAURUS_SCRAPER_URL is invalid: Fix the URL format and retry.Configuration must include at least 'selectors' or 'auto_source': You need to specify either manual selectors or enable auto-source.stylesheet.type invalid: Onlytext/cssandtext/xslare supported for stylesheets.
If parts of your items (e.g., title, link) are missing, check the following:
- Selector: Ensure the selector for the missing part is correct and relative to the
items.selector. - Extractor: Verify that you are using the correct
extractor(e.g.,text,href,attribute). - Dynamic Content:
faradaydoes not render JavaScript. If content loads dynamically, run with--strategy browserless(with Browserless available) or--strategy botasaurus(withBOTASAURUS_SCRAPER_URLconfigured) so the page can be rendered before extraction.
If you are having issues with date/time parsing, check the following:
- Date Format: The
parse_timepost-processor automatically detects common date formats using Ruby'sTime.parse. Ensure your date strings are in a recognizable format. time_zone: Specify the correcttime_zoneif the website uses a specific time zone.
If you are getting a "command not found" error, try the following:
- Re-install: Re-install
html2rssto ensure it is installed correctly:gem install html2rss. - Check
PATH: Ensure that the directory where Ruby gems are installed is in your system'sPATH.
- Verify Docker is installed and running:
<Code code={
docker --version} lang="bash" /> - Check logs for errors:
<Code code={
docker compose logs} lang="bash" /> - Ensure the app port (default compose binding: 4000) isn’t already in use:
<Code code={
lsof -i :4000} lang="bash" /> - If the app exits immediately in production, check that
HTML2RSS_SECRET_KEYis set.
- Confirm your firewall allows traffic on port 4000 or your reverse-proxy ports
- Try accessing via the server’s IP instead of a domain name
- Double-check that containers are running:
<Code code={
docker compose ps} lang="bash" />
- 401 Unauthorized when creating feeds: The create-feed API expects a bearer token. Re-enter a valid access token in the UI or send
Authorization: Bearer ...toPOST /api/v1/feeds. - 403 Forbidden when creating feeds: Automatic feed generation may be disabled (
AUTO_SOURCE_ENABLED=false) or the requested URL may not be allowed for the authenticated account. - 500 Internal Server Error: Check the application logs for detailed error information.
- Health endpoint failures: Use
GET /api/v1/health/live,GET /api/v1/health/ready, or authenticatedGET /api/v1/healthdepending on which probe you are testing.
- Some sites may require JavaScript rendering; ensure the
browserlessservice is running - Check the feed configuration in
feeds.ymlfor typos or invalid selectors - Look for parsing errors in the logs:
<Code code={
docker compose logs html2rss-web} lang="bash" />
- Mobile Redirects: Check that the channel URL does not redirect to a mobile page with a different markup structure.
curlandpup: For static sites, usecurlandpupto quickly find selectors:curl URL | pup.- CSS Selectors: For a comprehensive overview of CSS selectors, see the W3C documentation.
- Join our community discussions
- Review the deployment guide for production best practices