Skip to content

Latest commit

 

History

History
104 lines (80 loc) · 2.66 KB

File metadata and controls

104 lines (80 loc) · 2.66 KB
title Handling Dynamic Content
description Learn how to handle JavaScript-heavy websites and dynamic content with html2rss. Use browserless strategy for sites that load content dynamically.

import Code from "astro/components/Code.astro";

Some websites load their content dynamically using JavaScript. The default html2rss strategy might not see this content.

Solution

Use the browserless strategy to render JavaScript-heavy websites with a headless browser.

Keep the strategy at the top level and put request-specific options under request:

<Code code={strategy: browserless request: max_redirects: 5 max_requests: 6 browserless: preload: wait_after_ms: 5000 channel: url: https://example.com/app selectors: items: selector: .article title: selector: h2 url: selector: a extractor: href} lang="yaml" />

When to Use Browserless

The browserless strategy is necessary when:

  • Content loads after page load - JavaScript fetches data from APIs
  • Single Page Applications (SPAs) - React, Vue, Angular apps
  • Infinite scroll - Content loads as you scroll
  • Dynamic forms - Content changes based on user interaction

Preload Actions

For dynamic sites, rendering once is often not enough. Use request.browserless.preload to wait, click, or scroll before the HTML snapshot is taken.

Wait Before Capturing Dynamic Content

strategy: browserless
request:
  browserless:
    preload:
      wait_after_ms: 4000

Click "Load More" Buttons

strategy: browserless
request:
  browserless:
    preload:
      wait_after_ms: 3000
      click_selectors:
        - selector: ".load-more"
          max_clicks: 3
          wait_after_ms: 250

Scroll Infinite Lists

strategy: browserless
request:
  browserless:
    preload:
      scroll_down:
        iterations: 5
        wait_after_ms: 200
      wait_after_ms: 2500

These preload steps can be combined in a single config when a site needs several interactions before all items appear.

Performance Considerations

The browserless strategy is slower than the default faraday strategy because it:

  • Launches a headless Chrome browser
  • Renders the full page with JavaScript
  • Takes more memory and CPU resources

Use faraday for static content and only switch to browserless when necessary.

Related Topics