title	CLI Reference
description	Complete reference for the html2rss command-line interface

import { Code } from "@astrojs/starlight/components";

This page documents the html2rss command-line interface (CLI).

For detailed documentation on the Ruby API, please refer to the official YARD documentation.

📚 View the Ruby API Docs on rubydoc.info

Commands

The html2rss executable is the primary way to interact with the gem from your terminal.

Auto

Automatically discovers items from a page and prints the generated RSS feed to stdout.

Command: html2rss auto URL

Default behavior is --strategy auto, which tries faraday then botasaurus then browserless.

URL Surface Guidance For `auto`

auto works best when the input URL already exposes a server-rendered list of entries.

High-success surfaces:
- newsroom or press listing pages
- blog/category/tag listing pages
- changelog/release notes/update listing pages
- paginated archive/list views
Low-success surfaces:
- generic homepages with heavy promo/navigation chrome
- search results pages
- client-rendered app shells (#app, #root, #__next, etc.)

When possible, pass a direct listing/update URL instead of a top-level homepage or app entrypoint.

Failure Outcomes You Should Expect

When no extractable items are found, auto classifies likely causes instead of only returning a generic message:

blocked surface likely (anti-bot or interstitial):
- try a more specific public listing URL
app-shell surface detected:
- switch to a direct listing/update URL
unsupported extraction surface for auto mode:
- switch to listing/changelog/category URLs
- use explicit selectors in a feed config

Known anti-bot interstitial responses (for example Cloudflare challenge pages) are surfaced explicitly as blocked-surface errors.

If all fallback tiers run but still extract zero items, html2rss raises:

No RSS feed items extracted after auto fallback ...

If failures continue after URL/surface fixes, retry with an explicit browser-based override (--strategy browserless), or --strategy botasaurus when BOTASAURUS_SCRAPER_URL is configured.

Start by changing the input URL to a direct listing/update page, then move to explicit selectors if needed.

Browserless Setup And Diagnostics (CLI)

browserless is an explicit override for CLI usage.

<Code code={`

1) Start Browserless in the background

docker run -d --rm --name html2rss-browserless
-p 3000:3000
-e "CONCURRENT=10"
-e "TOKEN=6R0W53R135510"
ghcr.io/browserless/chromium

2) Run html2rss against Browserless

BROWSERLESS_IO_WEBSOCKET_URL="ws://127.0.0.1:3000"
BROWSERLESS_IO_API_TOKEN="6R0W53R135510"
html2rss auto https://example.com/updates --strategy browserless

3) Stop Browserless when done

docker stop html2rss-browserless `} lang="bash" />

If you see Browserless connection failed, check:

BROWSERLESS_IO_WEBSOCKET_URL points to a reachable Browserless endpoint
BROWSERLESS_IO_API_TOKEN matches the Browserless TOKEN
the Browserless service is running and reachable from your shell environment

For custom Browserless endpoints, BROWSERLESS_IO_API_TOKEN is required.

Botasaurus Environment Requirement (CLI)

botasaurus is an explicit override for CLI usage and requires BOTASAURUS_SCRAPER_URL:

If you see a Botasaurus configuration error, check:

BOTASAURUS_SCRAPER_URL is set
BOTASAURUS_SCRAPER_URL is a valid URL
the Botasaurus scrape API is reachable from the shell environment running html2rss

Feed

Loads a YAML config, builds the feed, and prints the RSS XML to stdout.

Command: html2rss feed YAML_FILE [feed_name]

The CLI keeps strategy as a top-level override and writes runtime request limits into the generated config under request.

Schema

Prints the exported JSON Schema for the current gem version.

Command: html2rss schema

Validate

Validates a config with the runtime validator without generating a feed.

Command: html2rss validate YAML_FILE [feed_name]

Help

Displays the help message with available commands and options.

Command: html2rss help

Version

Displays the installed version of html2rss.

Command: html2rss --version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commands

Auto

URL Surface Guidance For `auto`

Failure Outcomes You Should Expect

Browserless Setup And Diagnostics (CLI)

1) Start Browserless in the background

2) Run html2rss against Browserless

3) Stop Browserless when done

Botasaurus Environment Requirement (CLI)

Feed

Schema

Validate

Help

Version

FilesExpand file tree

cli-reference.mdx

Latest commit

History

cli-reference.mdx

File metadata and controls

Commands

Auto

URL Surface Guidance For auto

Failure Outcomes You Should Expect

Browserless Setup And Diagnostics (CLI)

1) Start Browserless in the background

2) Run html2rss against Browserless

3) Stop Browserless when done

Botasaurus Environment Requirement (CLI)

Feed

Schema

Validate

Help

Version

URL Surface Guidance For `auto`