Skip to content

Commit 02dede1

Browse files
committed
docs: align onboarding and auto-source docs
1 parent 04f9629 commit 02dede1

8 files changed

Lines changed: 38 additions & 73 deletions

File tree

src/content/docs/getting-started.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: "Getting Started"
3-
description: "Start html2rss-web locally, verify the web interface, generate your first feed URL, and decide when to move to custom configs."
3+
description: "Start html2rss-web locally, verify a working included feed from your self-hosted instance, and decide when to enable automatic generation or move to custom configs."
44
sidebar:
55
order: 1
66
---
@@ -17,14 +17,14 @@ That guide is the canonical setup flow for:
1717

1818
- running `html2rss-web` locally
1919
- confirming the interface is working
20-
- generating a first feed URL
20+
- opening a first included feed URL
2121
- deciding when to use automatic generation or custom configs
2222

2323
## Quick Shortcuts
2424

2525
- **[Run html2rss-web with Docker](/web-application/getting-started)**: recommended first step
26-
- **[Use automatic feed generation](/web-application/how-to/use-automatic-feed-generation/)**: create a feed directly from a page URL
2726
- **[Browse working feed examples](/feed-directory/)**: see what successful outputs look like
27+
- **[Use automatic feed generation](/web-application/how-to/use-automatic-feed-generation/)**: enable direct feed creation from a page URL when you want that workflow
2828
- **[Create Custom Feeds](/creating-custom-feeds)**: write configs when you need more control
2929
- **[Troubleshooting Guide](/troubleshooting/troubleshooting)**: fix startup or extraction problems
3030

src/content/docs/index.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
22
title: "Turn Any Website Into an RSS Feed"
3-
description: "Run html2rss-web with Docker, open a working included feed from your own instance, and move to feature enablement or custom configs only when you need more control."
3+
description: "Run html2rss-web with Docker, verify a working included feed from your self-hosted instance, then consciously enable automatic generation or move to custom configs when you need more control."
44
---
55

6-
Run `html2rss-web` with Docker, open a working included feed from your own instance, and move to direct generation or custom configs only when you need more control.
6+
Run `html2rss-web` with Docker, verify a working included feed from your self-hosted instance, and only then decide whether to enable automatic generation or move to custom configs.
77

88
## Start Here
99

@@ -14,7 +14,7 @@ That guide is the canonical onboarding flow for:
1414
- starting a local instance
1515
- verifying the web interface
1616
- opening a first included feed URL
17-
- deciding when to use automatic generation or custom configs
17+
- deciding when to consciously enable automatic generation or move to custom configs
1818

1919
## How It Works
2020

@@ -63,7 +63,7 @@ Most people should start with the web application:
6363

6464
- Start with Docker, not a public instance.
6565
- Use an included feed to verify the deployment first.
66-
- Enable automatic generation only when you want the direct page-URL workflow.
66+
- Enable automatic generation only when you want the direct page-URL workflow and are ready to allow it on your self-hosted instance.
6767
- Move to custom configs when you need a stable, reviewable setup.
6868

6969
**Need help?** Continue to the [troubleshooting guide](/troubleshooting/troubleshooting) or join [GitHub Discussions](https://github.com/orgs/html2rss/discussions).

src/content/docs/ruby-gem/how-to/advanced-features.mdx

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,7 @@ This guide covers advanced features and performance optimizations for html2rss.
77

88
## Parallel Processing
99

10-
html2rss uses parallel processing in auto-source discovery to improve performance when multiple scrapers inspect the same page. This happens automatically and doesn't require any configuration.
11-
12-
### How It Works
13-
14-
- **Auto-source scraping:** Multiple scrapers run in parallel to analyze the same response body
15-
- **Selectors and pagination:** Selector extraction and `rel="next"` pagination stay sequential and share the same request budget
16-
- **Performance benefit:** Faster auto-discovery without changing selector semantics
10+
html2rss uses parallel processing in auto-source discovery. This happens automatically and doesn't require any configuration.
1711

1812
### Performance Tips
1913

@@ -75,8 +69,6 @@ selectors:
7569
extractor: "href"
7670
```
7771

78-
When you use the Browserless strategy, Chromium rejects transport-level headers such as `Host`, `Connection`, `Content-Length`, and `Transfer-Encoding`. html2rss filters those headers before navigation and logs the filtered header names at `info` level.
79-
8072
## Monitoring and Debugging
8173

8274
### Enable Debug Logging
@@ -90,7 +82,7 @@ LOG_LEVEL=debug html2rss feed config.yml
9082
Use the health check endpoint to monitor feed generation:
9183

9284
```bash
93-
curl -u username:password http://localhost:3000/health_check.txt
85+
curl -u username:password http://localhost:4000/health_check.txt
9486
```
9587

9688
## Article Validation

src/content/docs/ruby-gem/how-to/handling-dynamic-content.mdx

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -91,9 +91,6 @@ request:
9191
9292
These preload steps can be combined in a single config when a site needs several interactions before all items appear.
9393
94-
If a click or scroll step causes a real navigation, html2rss returns the final document metadata, not the original page-load
95-
metadata. That keeps extracted relative links anchored to the rendered page.
96-
9794
## Performance Considerations
9895
9996
The `browserless` strategy is slower than the default `faraday` strategy because it:

src/content/docs/ruby-gem/reference/wordpress-api.mdx

Lines changed: 20 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,11 @@
11
---
22
title: "WordPress API"
3-
description: "Use html2rss auto_source to read WordPress sites through their REST API instead of scraping article HTML."
3+
description: "Use the WordPress API scraper inside auto_source to read WordPress posts through the site's REST API."
44
---
55

6-
The `wordpress_api` scraper is part of `auto_source`. It detects WordPress sites that advertise a REST API in the page `<head>` and then fetches structured post data directly from that API.
6+
The `wordpress_api` scraper is part of `auto_source`. When a WordPress site exposes its public REST API, `html2rss` can read posts from that API instead of scraping article HTML.
77

8-
This is usually more reliable than HTML scraping because the response already contains fields such as title, content, excerpt, permalink, publish date, and category IDs.
9-
10-
## Detection
11-
12-
The scraper activates when the page contains:
13-
14-
```html
15-
<link rel="https://api.w.org/" href="https://example.com/wp-json/" />
16-
```
17-
18-
When that tag is present, `html2rss` resolves the API root and requests:
19-
20-
```text
21-
wp/v2/posts?per_page=100&_fields=id,title,excerpt,content,link,date,categories
22-
```
8+
This usually gives cleaner results because WordPress already exposes fields such as the title, content, excerpt, permalink, publish date, and category IDs.
239

2410
## Basic Usage
2511

@@ -31,11 +17,21 @@ channel:
3117
auto_source: {}
3218
```
3319
34-
If the target is a standard WordPress site with a public API, no selector configuration is required.
20+
If the target is a standard WordPress site with a public API, no selectors are required.
21+
22+
## Requirements
23+
24+
The scraper works when the page exposes the standard WordPress API link in its `<head>`:
25+
26+
```html
27+
<link rel="https://api.w.org/" href="https://example.com/wp-json/" />
28+
```
29+
30+
If that link is missing or the API is blocked, `auto_source` falls back to its other discovery strategies.
3531

36-
## Configure The Scraper
32+
## Disable It
3733

38-
You can disable the WordPress scraper while keeping the rest of `auto_source` enabled:
34+
You can disable `wordpress_api` while keeping the rest of `auto_source` enabled:
3935

4036
```yml
4137
channel:
@@ -46,11 +42,9 @@ auto_source:
4642
enabled: false
4743
```
4844

49-
This is useful if a site exposes the API link but you prefer another auto-source strategy.
50-
5145
## What Gets Extracted
5246

53-
The current scraper maps the WordPress post payload into `html2rss` article fields like this:
47+
The scraper maps the WordPress response into `html2rss` article fields like this:
5448

5549
| WordPress field | html2rss article field |
5650
| ------------------ | ---------------------- |
@@ -63,26 +57,10 @@ The current scraper maps the WordPress post payload into `html2rss` article fiel
6357

6458
If `content.rendered` is blank, the scraper falls back to `excerpt.rendered`.
6559

66-
## Behavior Notes
67-
68-
- The scraper uses the shared request session, so it participates in the same request safety model as the rest of the feed build.
69-
- It resolves relative API links against `channel.url`.
70-
- It keeps WordPress category IDs as strings; category-name resolution is not implemented yet.
71-
- It does not resolve `featured_media` into an image URL.
72-
73-
## When To Use It
74-
75-
Prefer `wordpress_api` when:
76-
77-
- The page is clearly powered by WordPress
78-
- The REST API is public
79-
- You want a more stable source than CSS selectors or heuristic HTML scraping
80-
81-
Prefer manual selectors when:
60+
## Notes
8261

83-
- The site blocks or customizes the API heavily
84-
- You need fields that are not exposed by the post endpoint
85-
- You need full control over filtering or presentation
62+
- Categories stay as WordPress category IDs. Category names are not resolved yet.
63+
- Featured images are not pulled from `featured_media` yet.
8664

8765
## Related Docs
8866

src/content/docs/web-application/getting-started.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ Start with an included config from your own instance:
7878
2. copy that feed URL into your reader
7979
3. confirm your reader can subscribe successfully
8080

81-
That proves the core path before you invest in custom configs or feature enablement.
81+
That proves the core path before you invest in automatic generation or custom configs.
8282

8383
<AutoGenerationOptional />
8484

src/content/docs/web-application/how-to/deployment.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ import DockerComposeSnippet from "../../../../components/docs/DockerComposeSnipp
77

88
html2rss-web ships on Docker Hub, so you can launch this self-hosted service wherever Docker runs. Start with the official [`docker-compose.yml`](https://github.com/html2rss/html2rss-web/blob/master/docker-compose.yml) from the [Installation Guide](/web-application/getting-started) as your baseline.
99

10-
If you have not yet created a local instance, complete the [Getting Started guide](/web-application/getting-started) first. It walks through the one-time project directory setup, downloading the reference compose file, and confirming the application locally—steps we will build upon here.
10+
If you have not yet created a local instance, complete the [Getting Started guide](/web-application/getting-started) first. It walks through the one-time project directory setup, creating a minimal compose file, and confirming the application locally, which gives you the right baseline before exposing a self-hosted instance publicly.
1111

1212
Already running html2rss-web on your workstation? Great! The sections below focus on what changes when you take that setup to production.
1313

src/content/docs/web-application/how-to/use-automatic-feed-generation.mdx

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,18 @@ title: "Use automatic feed generation"
33
description: "Enable the web UI flow that generates a feed directly from a page URL."
44
---
55

6-
Automatic feed generation is the direct web-interface workflow: paste a page URL, create a feed, then copy the generated feed URL.
6+
Automatic feed generation is a standout `html2rss-web` feature: paste a page URL, create a feed, then copy the generated feed URL.
77

8-
> **Note:** This feature is disabled by default for security reasons.
8+
> **Note:** This feature is disabled by default. Enabling it should be a conscious decision on your self-hosted instance.
99
1010
## How to Enable It
1111

12-
Edit your `docker-compose.yml` and enable the automatic generation environment variables:
12+
Edit your `docker-compose.yml` and enable automatic feed generation:
1313

1414
```yaml
1515
environment:
1616
AUTO_SOURCE_ENABLED: "true"
17-
AUTO_SOURCE_USERNAME: your-username
18-
AUTO_SOURCE_PASSWORD: your-secure-password
19-
AUTO_SOURCE_ALLOWED_ORIGINS: 127.0.0.1:3000
17+
AUTO_SOURCE_ALLOWED_ORIGINS: 127.0.0.1:4000
2018
```
2119
2220
Then restart:
@@ -28,10 +26,10 @@ docker compose up -d
2826

2927
## How to Use It
3028

31-
1. Open your instance at `http://localhost:3000`
29+
1. Open your instance at `http://localhost:4000`
3230
2. Paste a page URL into `Create a feed`
3331
3. Submit the form
34-
4. If access is required, provide the configured access token
32+
4. If the instance requires access, provide a configured access token
3533
5. Copy the generated feed URL or open it directly
3634

3735
## What Success Looks Like
@@ -43,7 +41,7 @@ When the flow works, you should see:
4341
- an open-feed action
4442
- a preview of recent entries when available
4543

46-
That is enough to confirm the endpoint is live.
44+
That is enough to confirm the self-hosted flow is working.
4745

4846
## When to Stop and Switch
4947

0 commit comments

Comments
 (0)