You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`onopentag(name, attribs, isImplied)`| Opening tag. `attribs` is an object mapping attribute names to values. `isImplied` is `true` when the tag was opened implicitly (HTML mode only). |
93
+
|`onopentagname(name)`| Emitted for the tag name as soon as it is available (before attributes are parsed). |
94
+
|`onattribute(name, value, quote)`| Attribute. `quote` is `"` / `'` / `null` (unquoted) / `undefined` (no value, e.g. `disabled`). |
95
+
|`onclosetag(name, isImplied)`| Closing tag. `isImplied` is `true` when the tag was closed implicitly (HTML mode only). |
96
+
|`ontext(data)`| Text content. May fire multiple times for a single text node. |
97
+
|`oncomment(data)`| Comment (content between `<!--` and `-->`). |
98
+
|`oncdatastart()`| Opening of a CDATA section (`<![CDATA[`). |
99
+
|`oncdataend()`| End of a CDATA section (`]]>`). |
|`xmlMode`|`boolean`|`false`| Treat the document as XML. This affects entity decoding, self-closing tags, CDATA handling, and more. Set this to `true` for XML, RSS, Atom and RDF feeds. |
112
+
|`decodeEntities`|`boolean`|`true`| Decode HTML entities (e.g. `&` -> `&`). |
113
+
|`lowerCaseTags`|`boolean`|`!xmlMode`| Lowercase tag names. |
For CSS selector queries, use [`css-select`](https://github.com/fb55/css-select):
188
+
189
+
```js
190
+
import { selectAll, selectOne } from"css-select";
191
+
192
+
constresults=selectAll("ul#fruits > li", dom);
193
+
constfirst=selectOne("li.apple", dom);
115
194
```
116
195
117
-
The `DomHandler`, while still bundled with this module, was moved to its [own module](https://github.com/fb55/domhandler).
118
-
Have a look at that for further information.
196
+
Or, if you'd prefer a jQuery-like API, use [`cheerio`](https://github.com/cheeriojs/cheerio).
197
+
198
+
### Modifying and serializing the DOM
119
199
120
-
## Parsing Feeds
200
+
Use `DomUtils` to modify the tree, and [`dom-serializer`](https://github.com/cheeriojs/dom-serializer) (also available as `DomUtils.getOuterHTML`) to serialize it back to HTML:
Other manipulation helpers include `appendChild`, `prependChild`, `append`, `prepend`, and `replaceElement` -- see the [`domutils` docs](https://github.com/fb55/domutils) for the full API.
219
+
220
+
## Parsing feeds
121
221
122
222
`htmlparser2` makes it easy to parse RSS, RDF and Atom feeds, by providing a `parseFeed` method:
This returns an object with `type`, `title`, `link`, `description`, `updated`, `author`, and `items` (an array of feed entries), or `null` if the document isn't a recognized feed format.
229
+
230
+
The `xmlMode` option is enabled by default for `parseFeed`. If you pass custom options, make sure to include `xmlMode:true`.
231
+
128
232
## Performance
129
233
130
234
After having some artificial benchmarks for some time, **@AndreasMadsen** published his [`htmlparser-benchmark`](https://github.com/AndreasMadsen/htmlparser-benchmark), which benchmarks HTML parses based on real-world websites.
## How does this module differ from [node-htmlparser](https://github.com/tautologistics/node-htmlparser)?
151
-
152
-
In 2011, this module started as a fork of the `htmlparser` module.
153
-
`htmlparser2` was rewritten multiple times and, while it maintains an API that's mostly compatible with `htmlparser`, the projects don't share any code anymore.
154
-
155
-
The parser now provides a callback interface inspired by [sax.js](https://github.com/isaacs/sax-js) (originally targeted at [readabilitySAX](https://github.com/fb55/readabilitysax)).
156
-
As a result, old handlers won't work anymore.
157
-
158
-
The `DefaultHandler` was renamed to clarify its purpose (to `DomHandler`). The old name is still available when requiring `htmlparser2` and your code should work as expected.
159
-
160
-
The `RssHandler` was replaced with a `getFeed` function that takes a `DomHandler` DOM and returns a feed object. There is a `parseFeed` helper function that can be used to parse a feed from a string.
161
-
162
254
## Security contact information
163
255
164
256
To report a security vulnerability, please use the [Tidelift security contact](https://tidelift.com/security).
0 commit comments