Skip to content

Latest commit

 

History

History
108 lines (84 loc) · 2.76 KB

File metadata and controls

108 lines (84 loc) · 2.76 KB
title Handling Dynamic Content
description Learn how to handle JavaScript-heavy websites and dynamic content with html2rss. Use browserless strategy for sites that load content dynamically.

import Code from "astro/components/Code.astro";

Some websites load their content dynamically using JavaScript. The default html2rss strategy might not see this content.

Solution

Use the browserless strategy to render JavaScript-heavy websites with a headless browser.

Keep the strategy at the top level and put request-specific options under request:

<Code code={strategy: browserless request: max_redirects: 5 max_requests: 6 browserless: preload: wait_for_network_idle: timeout_ms: 5000 channel: url: https://example.com/app selectors: items: selector: .article title: selector: h2 url: selector: a extractor: href} lang="yaml" />

When to Use Browserless

The browserless strategy is necessary when:

  • Content loads after page load - JavaScript fetches data from APIs
  • Single Page Applications (SPAs) - React, Vue, Angular apps
  • Infinite scroll - Content loads as you scroll
  • Dynamic forms - Content changes based on user interaction

Preload Actions

For dynamic sites, rendering once is often not enough. Use request.browserless.preload to wait, click, or scroll before the HTML snapshot is taken.

Wait for JavaScript Requests

strategy: browserless
request:
  browserless:
    preload:
      wait_for_network_idle:
        timeout_ms: 4000

Click "Load More" Buttons

strategy: browserless
request:
  browserless:
    preload:
      click_selectors:
        - selector: ".load-more"
          max_clicks: 3
          delay_ms: 250
          wait_for_network_idle:
            timeout_ms: 3000

Scroll Infinite Lists

strategy: browserless
request:
  browserless:
    preload:
      scroll_down:
        iterations: 5
        delay_ms: 200
        wait_for_network_idle:
          timeout_ms: 2500

These preload steps can be combined in a single config when a site needs several interactions before all items appear.

Performance Considerations

The browserless strategy is slower than the default faraday strategy because it:

  • Launches a headless Chrome browser
  • Renders the full page with JavaScript
  • Takes more memory and CPU resources

Use faraday for static content and only switch to browserless when necessary.

Related Topics