| title | Scraping JSON Responses |
|---|---|
| description | Learn how to scrape JSON APIs and responses with html2rss. Convert JSON to XML and use CSS selectors for data extraction. |
When a website returns a JSON response (i.e., with a Content-Type of application/json), html2rss converts the JSON to XML, allowing you to use CSS selectors for data extraction.
Note
The JSON response must be an Array or a Hash for the conversion to work.
A JSON object like this:
{
"data": [{ "title": "Headline", "url": "https://example.com" }]
}is converted to this XML structure:
<object>
<data>
<array>
<object>
<title>Headline</title>
<url>https://example.com</url>
</object>
</array>
</data>
</object>You would use array > object as your items selector.
A JSON array like this:
[{ "title": "Headline", "url": "https://example.com" }]is converted to this XML structure:
<array>
<object>
<title>Headline</title>
<url>https://example.com</url>
</object>
</array>You would use array > object as your items selector.
Html2rss.feed(
headers: {
Accept: 'application/json'
},
channel: {
url: 'https://domainname.tld/whatever.json'
},
selectors: {
items: { selector: 'array > object' },
title: { selector: 'title' },
url: { selector: 'url' }
}
)headers:
Accept: application/json
channel:
url: "https://domainname.tld/whatever.json"
selectors:
items:
selector: "array > object"
title:
selector: "title"
url:
selector: "url"