Skip to content

Commit 6043ee6

Browse files
committed
docs(ruby-gem): align reference pages with current config surface
1 parent 249ba37 commit 6043ee6

7 files changed

Lines changed: 148 additions & 30 deletions

File tree

src/content/docs/ruby-gem/reference/auto-source.mdx

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,11 @@ You can customize `auto_source` to improve its accuracy.
3333

3434
### Scraper Options
3535

36-
Enable or disable specific scrapers and adjust their settings:
36+
Enable or disable specific scrapers and adjust their settings in a complete feed config:
3737

3838
```yaml
39+
channel:
40+
url: https://example.com
3941
auto_source:
4042
scraper:
4143
schema:
@@ -55,6 +57,8 @@ auto_source:
5557
Remove unwanted items from the results:
5658

5759
```yaml
60+
channel:
61+
url: https://example.com
5862
auto_source:
5963
cleanup:
6064
keep_different_domain: false # default: true

src/content/docs/ruby-gem/reference/channel.mdx

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,9 @@ title: Channel
33
description: "Learn about the channel configuration block for RSS feed metadata. Configure feed title, description, author, and other RSS channel properties."
44
---
55

6-
The `channel` configuration block defines the metadata for your RSS feed.
6+
The `channel` configuration block defines your feed metadata.
7+
8+
This example is a complete feed config so you can see the `channel` block in context:
79

810
```yaml
911
channel:
@@ -12,8 +14,16 @@ channel:
1214
description: "A feed of the latest news from Example.com"
1315
author: "jane.doe@example.com (Jane Doe)"
1416
ttl: 60
15-
language: "en-us"
17+
language: "en"
1618
time_zone: "Europe/Berlin"
19+
selectors:
20+
items:
21+
selector: "article"
22+
title:
23+
selector: "h2"
24+
url:
25+
selector: "a"
26+
extractor: "href"
1727
```
1828
1929
## Options
@@ -28,6 +38,12 @@ channel:
2838
| `language` | Optional | The language of the feed. Defaults to the `lang` attribute of the `<html>` tag. |
2939
| `time_zone` | Optional | The time zone for parsing dates. See the [list of tz database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones). |
3040

41+
## Notes
42+
43+
- `language` is runtime-validated. Use a valid language code such as `en`, not an arbitrary string.
44+
- `author` should follow the RSS-style `email (Name)` format when you set it explicitly.
45+
- `time_zone` must be a known TZ database identifier such as `UTC` or `Europe/Berlin`.
46+
3147
---
3248

3349
For detailed documentation on the Ruby API, see the [official YARD documentation](https://www.rubydoc.info/gems/html2rss).

src/content/docs/ruby-gem/reference/cli-reference.mdx

Lines changed: 62 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -3,50 +3,92 @@ title: CLI Reference
33
description: Complete reference for the html2rss command-line interface
44
---
55

6-
This section provides a reference for the `html2rss` command-line interface (CLI).
6+
This page documents the `html2rss` command-line interface (CLI).
77

88
For detailed documentation on the Ruby API, please refer to the official YARD documentation.
99

1010
[**📚 View the Ruby API Docs on rubydoc.info**](https://www.rubydoc.info/gems/html2rss)
1111

12-
---
12+
## Commands
13+
14+
The `html2rss` executable is the primary way to interact with the gem from your terminal.
15+
16+
### Auto
1317

14-
### Command-Line Interface (CLI)
18+
Automatically discovers items from a page and prints the generated RSS feed to stdout.
19+
20+
```bash
21+
# Use the default faraday strategy
22+
html2rss auto https://example.com/articles
1523

16-
The `html2rss` executable provides the primary way to interact with the tool from your terminal.
24+
# Force browserless for JavaScript-heavy pages
25+
html2rss auto https://example.com/app --strategy browserless
1726

18-
#### `html2rss auto <URL>`
27+
# Hint the item selector while keeping auto enhancement
28+
html2rss auto https://example.com/articles --items_selector ".post-card"
29+
```
1930

20-
Automatically generates an RSS feed from the provided URL.
31+
Command: `html2rss auto URL`
2132

22-
- `<URL>` (Required): The URL of the website to generate a feed from.
33+
### Feed
2334

24-
**Example:**
35+
Loads a YAML config, builds the feed, and prints the RSS XML to stdout.
2536

2637
```bash
27-
html2rss auto https://unmatchedstyle.com/
38+
# Single-feed config
39+
html2rss feed single.yml
40+
41+
# Multi-feed config under the `feeds:` key
42+
html2rss feed feeds.yml my-first-feed
43+
44+
# Override the request strategy at runtime
45+
html2rss feed single.yml --strategy browserless
46+
47+
# Pass dynamic parameters into %<param>s placeholders
48+
html2rss feed single.yml --params id:42 foo:bar
2849
```
2950

30-
#### `html2rss feed <CONFIG_FILE>`
51+
Command: `html2rss feed YAML_FILE [feed_name]`
3152

32-
Generates an RSS feed based on the provided YAML configuration file.
53+
### Schema
3354

34-
- `<CONFIG_FILE>` (Required): Path to your YAML configuration file.
55+
Prints the exported JSON Schema for the current gem version.
3556

36-
**Examples:**
57+
```bash
58+
# Pretty-printed JSON (default)
59+
html2rss schema
60+
61+
# Compact JSON
62+
html2rss schema --no-pretty
63+
64+
# Write the schema to a file
65+
html2rss schema --write tmp/html2rss-config.schema.json
66+
```
67+
68+
Command: `html2rss schema`
69+
70+
### Validate
71+
72+
Validates a config with the runtime validator without generating a feed.
3773

3874
```bash
39-
# Generate and print to console
40-
html2rss feed my_feed.yml
75+
# Validate a single-feed file
76+
html2rss validate single.yml
4177

42-
# Generate and save to an XML file
43-
html2rss feed my_feed.yml > my_feed.xml
78+
# Validate one feed from a multi-feed file
79+
html2rss validate feeds.yml my-first-feed
4480
```
4581

46-
#### `html2rss help`
82+
Command: `html2rss validate YAML_FILE [feed_name]`
83+
84+
### Help
4785

4886
Displays the help message with available commands and options.
4987

50-
#### `html2rss --version`
88+
Command: `html2rss help`
89+
90+
### Version
91+
92+
Displays the installed version of `html2rss`.
5193

52-
Displays the currently installed version of `html2rss`.
94+
Command: `html2rss --version`

src/content/docs/ruby-gem/reference/headers.mdx

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,22 @@ The `headers` key allows you to set custom HTTP headers for your requests. This
77

88
## Configuration
99

10-
You can add any number of headers to your configuration:
10+
You can add any number of headers to your configuration. This example is a complete, valid feed config:
1111

1212
```yaml
1313
headers:
1414
User-Agent: "Mozilla/5.0 (compatible; html2rss/1.0)"
1515
Authorization: "Bearer YOUR_TOKEN"
1616
Accept: "application/json"
17+
channel:
18+
url: "https://api.example.com/posts"
19+
selectors:
20+
items:
21+
selector: "array > object"
22+
title:
23+
selector: "title"
24+
url:
25+
selector: "url"
1726
```
1827
1928
## Dynamic Parameters

src/content/docs/ruby-gem/reference/selectors.mdx

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,32 @@ Available options:
4848
- `"reverse"`: Reverses the order of items (useful when the website shows oldest items first)
4949
- Default: Items appear in the order they are found on the page
5050

51+
## Paginated Feeds
52+
53+
`html2rss` can follow a single `rel="next"` pagination chain when you configure `selectors.items.pagination.max_pages`.
54+
55+
```yml
56+
channel:
57+
url: "https://example.com/news"
58+
selectors:
59+
items:
60+
selector: "article"
61+
pagination:
62+
max_pages: 3
63+
title:
64+
selector: "h1"
65+
url:
66+
selector: "a"
67+
extractor: "href"
68+
```
69+
70+
Behavior:
71+
72+
- `max_pages` is the total page budget for the item selector chain, including the initial page.
73+
- Pagination follows strict `link[rel~="next"]` or `a[rel~="next"]` targets only.
74+
- Pagination stops when there is no next link, a page repeats, or the shared request budget is exhausted.
75+
- The same request safeguards apply to pagination and Browserless navigation, including timeout limits, redirect limits, response-size guards, and private-network denial.
76+
5177
## RSS 2.0 Selectors
5278

5379
While you can define any named selector, only the following are used in the final RSS feed:

src/content/docs/ruby-gem/reference/strategy.mdx

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,20 @@ docker run \
2727

2828
### Configuration
2929

30-
Set the `strategy` to `browserless` in your feed configuration:
30+
Set the `strategy` at the top level of your feed configuration:
3131

3232
```yml
3333
strategy: browserless
34+
channel:
35+
url: "https://example.com/app"
36+
selectors:
37+
items:
38+
selector: ".article"
39+
title:
40+
selector: "h2"
41+
url:
42+
selector: "a"
43+
extractor: "href"
3444
```
3545
3646
### Command-Line Usage
@@ -39,11 +49,12 @@ You can also specify the strategy on the command line:
3949
4050
```sh
4151
# Set environment variables for your Browserless.io instance
42-
BROWSERLESS_IO_WEBSOCKET_URL="ws://127.0.0.1:3000"
43-
BROWSERLESS_IO_API_TOKEN="6R0W53R135510"
52+
BROWSERLESS_IO_WEBSOCKET_URL="ws://127.0.0.1:3000" \
53+
BROWSERLESS_IO_API_TOKEN="6R0W53R135510" \
54+
html2rss feed my_config.yml --strategy browserless
4455

45-
# Use the browserless strategy
46-
html2rss feed --strategy=browserless my_config.yml
56+
# Or rely on the strategy stored in the YAML config
57+
html2rss feed my_config.yml
4758
```
4859

4960
---

src/content/docs/ruby-gem/reference/stylesheets.mdx

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Styling your RSS feed provides several benefits:
1616

1717
## Configuration
1818

19-
You can add multiple stylesheets to your configuration:
19+
You can add multiple stylesheets to a normal feed configuration:
2020

2121
```yaml
2222
stylesheets:
@@ -26,6 +26,16 @@ stylesheets:
2626
- href: "https://example.com/rss.css"
2727
media: "all"
2828
type: "text/css"
29+
channel:
30+
url: "https://example.com/articles"
31+
selectors:
32+
items:
33+
selector: "article"
34+
title:
35+
selector: "h2"
36+
url:
37+
selector: "a"
38+
extractor: "href"
2939
```
3040
3141
## Further Reading

0 commit comments

Comments
 (0)