docs: align onboarding and auto-source docs

gildesmarais · gildesmarais · commit 02dede117fcd · 2026-03-19T22:59:08.000+01:00
diff --git a/src/content/docs/getting-started.mdx b/src/content/docs/getting-started.mdx
@@ -1,6 +1,6 @@
 ---
 title: "Getting Started"
-description: "Start html2rss-web locally, verify the web interface, generate your first feed URL, and decide when to move to custom configs."
+description: "Start html2rss-web locally, verify a working included feed from your self-hosted instance, and decide when to enable automatic generation or move to custom configs."
 sidebar:
   order: 1
 ---
@@ -17,14 +17,14 @@ That guide is the canonical setup flow for:
 
 - running `html2rss-web` locally
 - confirming the interface is working
-- generating a first feed URL
+- opening a first included feed URL
 - deciding when to use automatic generation or custom configs
 
 ## Quick Shortcuts
 
 - **[Run html2rss-web with Docker](/web-application/getting-started)**: recommended first step
-- **[Use automatic feed generation](/web-application/how-to/use-automatic-feed-generation/)**: create a feed directly from a page URL
 - **[Browse working feed examples](/feed-directory/)**: see what successful outputs look like
+- **[Use automatic feed generation](/web-application/how-to/use-automatic-feed-generation/)**: enable direct feed creation from a page URL when you want that workflow
 - **[Create Custom Feeds](/creating-custom-feeds)**: write configs when you need more control
 - **[Troubleshooting Guide](/troubleshooting/troubleshooting)**: fix startup or extraction problems
 
diff --git a/src/content/docs/index.mdx b/src/content/docs/index.mdx
@@ -1,9 +1,9 @@
 ---
 title: "Turn Any Website Into an RSS Feed"
-description: "Run html2rss-web with Docker, open a working included feed from your own instance, and move to feature enablement or custom configs only when you need more control."
+description: "Run html2rss-web with Docker, verify a working included feed from your self-hosted instance, then consciously enable automatic generation or move to custom configs when you need more control."
 ---
 
-Run `html2rss-web` with Docker, open a working included feed from your own instance, and move to direct generation or custom configs only when you need more control.
+Run `html2rss-web` with Docker, verify a working included feed from your self-hosted instance, and only then decide whether to enable automatic generation or move to custom configs.
 
 ## Start Here
 
@@ -14,7 +14,7 @@ That guide is the canonical onboarding flow for:
 - starting a local instance
 - verifying the web interface
 - opening a first included feed URL
-- deciding when to use automatic generation or custom configs
+- deciding when to consciously enable automatic generation or move to custom configs
 
 ## How It Works
 
@@ -63,7 +63,7 @@ Most people should start with the web application:
 
 - Start with Docker, not a public instance.
 - Use an included feed to verify the deployment first.
-- Enable automatic generation only when you want the direct page-URL workflow.
+- Enable automatic generation only when you want the direct page-URL workflow and are ready to allow it on your self-hosted instance.
 - Move to custom configs when you need a stable, reviewable setup.
 
 **Need help?** Continue to the [troubleshooting guide](/troubleshooting/troubleshooting) or join [GitHub Discussions](https://github.com/orgs/html2rss/discussions).
diff --git a/src/content/docs/ruby-gem/how-to/advanced-features.mdx b/src/content/docs/ruby-gem/how-to/advanced-features.mdx
@@ -7,13 +7,7 @@ This guide covers advanced features and performance optimizations for html2rss.
 
 ## Parallel Processing
 
-html2rss uses parallel processing in auto-source discovery to improve performance when multiple scrapers inspect the same page. This happens automatically and doesn't require any configuration.
-
-### How It Works
-
-- **Auto-source scraping:** Multiple scrapers run in parallel to analyze the same response body
-- **Selectors and pagination:** Selector extraction and `rel="next"` pagination stay sequential and share the same request budget
-- **Performance benefit:** Faster auto-discovery without changing selector semantics
+html2rss uses parallel processing in auto-source discovery. This happens automatically and doesn't require any configuration.
 
 ### Performance Tips
 
@@ -75,8 +69,6 @@ selectors:
     extractor: "href"
 ```
 
-When you use the Browserless strategy, Chromium rejects transport-level headers such as `Host`, `Connection`, `Content-Length`, and `Transfer-Encoding`. html2rss filters those headers before navigation and logs the filtered header names at `info` level.
-
 ## Monitoring and Debugging
 
 ### Enable Debug Logging
@@ -90,7 +82,7 @@ LOG_LEVEL=debug html2rss feed config.yml
 Use the health check endpoint to monitor feed generation:
 
 ```bash
-curl -u username:password http://localhost:3000/health_check.txt
+curl -u username:password http://localhost:4000/health_check.txt
 ```
 
 ## Article Validation
diff --git a/src/content/docs/ruby-gem/how-to/handling-dynamic-content.mdx b/src/content/docs/ruby-gem/how-to/handling-dynamic-content.mdx
@@ -91,9 +91,6 @@ request:
 
 These preload steps can be combined in a single config when a site needs several interactions before all items appear.
 
-If a click or scroll step causes a real navigation, html2rss returns the final document metadata, not the original page-load
-metadata. That keeps extracted relative links anchored to the rendered page.
-
 ## Performance Considerations
 
 The `browserless` strategy is slower than the default `faraday` strategy because it:
diff --git a/src/content/docs/ruby-gem/reference/wordpress-api.mdx b/src/content/docs/ruby-gem/reference/wordpress-api.mdx
@@ -1,25 +1,11 @@
 ---
 title: "WordPress API"
-description: "Use html2rss auto_source to read WordPress sites through their REST API instead of scraping article HTML."
+description: "Use the WordPress API scraper inside auto_source to read WordPress posts through the site's REST API."
 ---
 
-The `wordpress_api` scraper is part of `auto_source`. It detects WordPress sites that advertise a REST API in the page `<head>` and then fetches structured post data directly from that API.
+The `wordpress_api` scraper is part of `auto_source`. When a WordPress site exposes its public REST API, `html2rss` can read posts from that API instead of scraping article HTML.
 
-This is usually more reliable than HTML scraping because the response already contains fields such as title, content, excerpt, permalink, publish date, and category IDs.
-
-## Detection
-
-The scraper activates when the page contains:
-
-```html
-<link rel="https://api.w.org/" href="https://example.com/wp-json/" />
-```
-
-When that tag is present, `html2rss` resolves the API root and requests:
-
-```text
-wp/v2/posts?per_page=100&_fields=id,title,excerpt,content,link,date,categories
-```
+This usually gives cleaner results because WordPress already exposes fields such as the title, content, excerpt, permalink, publish date, and category IDs.
 
 ## Basic Usage
 
@@ -31,11 +17,21 @@ channel:
 auto_source: {}
 ```
 
-If the target is a standard WordPress site with a public API, no selector configuration is required.
+If the target is a standard WordPress site with a public API, no selectors are required.
+
+## Requirements
+
+The scraper works when the page exposes the standard WordPress API link in its `<head>`:
+
+```html
+<link rel="https://api.w.org/" href="https://example.com/wp-json/" />
+```
+
+If that link is missing or the API is blocked, `auto_source` falls back to its other discovery strategies.
 
-## Configure The Scraper
+## Disable It
 
-You can disable the WordPress scraper while keeping the rest of `auto_source` enabled:
+You can disable `wordpress_api` while keeping the rest of `auto_source` enabled:
 
 ```yml
 channel:
@@ -46,11 +42,9 @@ auto_source:
       enabled: false
 ```
 
-This is useful if a site exposes the API link but you prefer another auto-source strategy.
-
 ## What Gets Extracted
 
-The current scraper maps the WordPress post payload into `html2rss` article fields like this:
+The scraper maps the WordPress response into `html2rss` article fields like this:
 
 | WordPress field    | html2rss article field |
 | ------------------ | ---------------------- |
@@ -63,26 +57,10 @@ The current scraper maps the WordPress post payload into `html2rss` article fiel
 
 If `content.rendered` is blank, the scraper falls back to `excerpt.rendered`.
 
-## Behavior Notes
-
-- The scraper uses the shared request session, so it participates in the same request safety model as the rest of the feed build.
-- It resolves relative API links against `channel.url`.
-- It keeps WordPress category IDs as strings; category-name resolution is not implemented yet.
-- It does not resolve `featured_media` into an image URL.
-
-## When To Use It
-
-Prefer `wordpress_api` when:
-
-- The page is clearly powered by WordPress
-- The REST API is public
-- You want a more stable source than CSS selectors or heuristic HTML scraping
-
-Prefer manual selectors when:
+## Notes
 
-- The site blocks or customizes the API heavily
-- You need fields that are not exposed by the post endpoint
-- You need full control over filtering or presentation
+- Categories stay as WordPress category IDs. Category names are not resolved yet.
+- Featured images are not pulled from `featured_media` yet.
 
 ## Related Docs
 
diff --git a/src/content/docs/web-application/getting-started.mdx b/src/content/docs/web-application/getting-started.mdx
@@ -78,7 +78,7 @@ Start with an included config from your own instance:
 2. copy that feed URL into your reader
 3. confirm your reader can subscribe successfully
 
-That proves the core path before you invest in custom configs or feature enablement.
+That proves the core path before you invest in automatic generation or custom configs.
 
 <AutoGenerationOptional />
 
diff --git a/src/content/docs/web-application/how-to/deployment.mdx b/src/content/docs/web-application/how-to/deployment.mdx
@@ -7,7 +7,7 @@ import DockerComposeSnippet from "../../../../components/docs/DockerComposeSnipp
 
 html2rss-web ships on Docker Hub, so you can launch this self-hosted service wherever Docker runs. Start with the official [`docker-compose.yml`](https://github.com/html2rss/html2rss-web/blob/master/docker-compose.yml) from the [Installation Guide](/web-application/getting-started) as your baseline.
 
-If you have not yet created a local instance, complete the [Getting Started guide](/web-application/getting-started) first. It walks through the one-time project directory setup, downloading the reference compose file, and confirming the application locally—steps we will build upon here.
+If you have not yet created a local instance, complete the [Getting Started guide](/web-application/getting-started) first. It walks through the one-time project directory setup, creating a minimal compose file, and confirming the application locally, which gives you the right baseline before exposing a self-hosted instance publicly.
 
 Already running html2rss-web on your workstation? Great! The sections below focus on what changes when you take that setup to production.
 
diff --git a/src/content/docs/web-application/how-to/use-automatic-feed-generation.mdx b/src/content/docs/web-application/how-to/use-automatic-feed-generation.mdx
@@ -3,20 +3,18 @@ title: "Use automatic feed generation"
 description: "Enable the web UI flow that generates a feed directly from a page URL."
 ---
 
-Automatic feed generation is the direct web-interface workflow: paste a page URL, create a feed, then copy the generated feed URL.
+Automatic feed generation is a standout `html2rss-web` feature: paste a page URL, create a feed, then copy the generated feed URL.
 
-> **Note:** This feature is disabled by default for security reasons.
+> **Note:** This feature is disabled by default. Enabling it should be a conscious decision on your self-hosted instance.
 
 ## How to Enable It
 
-Edit your `docker-compose.yml` and enable the automatic generation environment variables:
+Edit your `docker-compose.yml` and enable automatic feed generation:
 
 ```yaml
 environment:
   AUTO_SOURCE_ENABLED: "true"
-  AUTO_SOURCE_USERNAME: your-username
-  AUTO_SOURCE_PASSWORD: your-secure-password
-  AUTO_SOURCE_ALLOWED_ORIGINS: 127.0.0.1:3000
+  AUTO_SOURCE_ALLOWED_ORIGINS: 127.0.0.1:4000
 ```
 
 Then restart:
@@ -28,10 +26,10 @@ docker compose up -d
 
 ## How to Use It
 
-1. Open your instance at `http://localhost:3000`
+1. Open your instance at `http://localhost:4000`
 2. Paste a page URL into `Create a feed`
 3. Submit the form
-4. If access is required, provide the configured access token
+4. If the instance requires access, provide a configured access token
 5. Copy the generated feed URL or open it directly
 
 ## What Success Looks Like
@@ -43,7 +41,7 @@ When the flow works, you should see:
 - an open-feed action
 - a preview of recent entries when available
 
-That is enough to confirm the endpoint is live.
+That is enough to confirm the self-hosted flow is working.
 
 ## When to Stop and Switch