Skip to content

Commit b679ff7

Browse files
authored
Merge pull request #137 from Darkheir/fix/decode_bytes_if_needed
fix: Decodes bytes if needed in get_body
2 parents 022d711 + 15f3692 commit b679ff7

1 file changed

Lines changed: 3 additions & 1 deletion

File tree

readability/htmls.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,9 @@ def get_body(doc):
143143
elem.drop_tree()
144144
# tostring() always return utf-8 encoded string
145145
# FIXME: isn't better to use tounicode?
146-
raw_html = str_(tostring(doc.body or doc))
146+
raw_html = tostring(doc.body or doc)
147+
if isinstance(raw_html, bytes):
148+
raw_html = raw_html.decode()
147149
cleaned = clean_attributes(raw_html)
148150
try:
149151
# BeautifulSoup(cleaned) #FIXME do we really need to try loading it?

0 commit comments

Comments
 (0)