Skip to content

Commit 15f3692

Browse files
committed
fix: Decodes bytes if needed in get_body
1 parent 5800210 commit 15f3692

1 file changed

Lines changed: 3 additions & 1 deletion

File tree

readability/htmls.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,9 @@ def get_body(doc):
134134
elem.drop_tree()
135135
# tostring() always return utf-8 encoded string
136136
# FIXME: isn't better to use tounicode?
137-
raw_html = str_(tostring(doc.body or doc))
137+
raw_html = tostring(doc.body or doc)
138+
if isinstance(raw_html, bytes):
139+
raw_html = raw_html.decode()
138140
cleaned = clean_attributes(raw_html)
139141
try:
140142
# BeautifulSoup(cleaned) #FIXME do we really need to try loading it?

0 commit comments

Comments
 (0)