Skip to content

Commit a34fc29

Browse files
committed
client: fix two real SDK bugs surfaced by integration harness
1. _scrape_request content-type KeyError on PUT without explicit header. Line 213 did: 'content-type': scrape_config.headers['content-type'] if scrape_config.method in ['POST', 'PUT', 'PATCH'] else self.body_handler.content_type This hard-reads the dict key for every PUT/PATCH/POST request. Callers who send a PUT with just a body but no Content-Type header (e.g. client.scrape(ScrapeConfig(url=..., method='PUT', body='test'))) got: KeyError: 'content-type' Fixed: use .get('content-type', self.body_handler.content_type) as the fallback so absent header falls through to the SDK default. 2. _handle_api_response UnicodeDecodeError on non-msgpack/json compressed responses. Line 830 did: body = response.content.decode('utf-8') ...when body_handler.support(headers) returned False, which happens when the server returns a content-type that isn't application/msgpack or application/json. For responses that are still compressed (zstd, brotli) because the requests library didn't transparently decompress them, this raises UnicodeDecodeError: invalid start byte. Fixed: if response has content-encoding in the known set, call body_handler.read() to decompress first, then decode as utf-8 with errors='replace' as a last resort so the error never bubbles as a cryptic Unicode failure. Both surfaced by sdk/integration tests test_scrape_http_method_put and test_scrape_http_method_delete after the harness was set up to run every SDK feature against the dev stack.
1 parent b4f17dd commit a34fc29

1 file changed

Lines changed: 34 additions & 3 deletions

File tree

scrapfly/client.py

Lines changed: 34 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -210,14 +210,22 @@ def _scrape_request(self, scrape_config:ScrapeConfig):
210210
'verify': self.verify,
211211
'timeout': (self.connect_timeout, self.web_scraping_api_read_timeout),
212212
'headers': {
213-
'content-type': scrape_config.headers['content-type'] if scrape_config.method in ['POST', 'PUT', 'PATCH'] else self.body_handler.content_type,
213+
# When method has a body (POST/PUT/PATCH) AND the caller
214+
# explicitly set a Content-Type, forward it. Otherwise fall
215+
# back to the body_handler default so we don't KeyError on
216+
# callers who omit the header (e.g. simple PUT "test-body").
217+
'content-type': (
218+
scrape_config.headers.get('content-type', self.body_handler.content_type)
219+
if scrape_config.method in ['POST', 'PUT', 'PATCH']
220+
else self.body_handler.content_type
221+
),
214222
'accept-encoding': self.body_handler.content_encoding,
215223
'accept': self.body_handler.accept,
216224
'user-agent': self.ua
217225
},
218226
'params': scrape_config.to_api_params(key=self.key)
219227
}
220-
228+
221229
def _screenshot_request(self, screenshot_config:ScreenshotConfig):
222230
return {
223231
'method': 'GET',
@@ -819,7 +827,30 @@ def _handle_api_response(
819827
if self.body_handler.support(headers=response.headers):
820828
body = self.body_handler(content=response.content, content_type=response.headers['content-type'])
821829
else:
822-
body = response.content.decode('utf-8')
830+
# body_handler rejected — content-type not in SUPPORTED_CONTENT_TYPES.
831+
# Response may still be compressed (zstd/brotli) if requests did
832+
# not transparently decompress. Probe content-encoding and try
833+
# the handler's read() anyway before falling back to a tolerant
834+
# utf-8 decode. Previously this branch raised UnicodeDecodeError
835+
# on valid zstd/br responses with a non-json/msgpack content-type.
836+
raw = response.content
837+
content_encoding = response.headers.get('content-encoding', '').lower()
838+
if content_encoding in ('gzip', 'gz', 'deflate', 'br', 'brotli', 'zstd'):
839+
try:
840+
raw = self.body_handler.read(
841+
content=raw,
842+
content_encoding=content_encoding,
843+
content_type=response.headers.get('content-type', ''),
844+
signature=None,
845+
)
846+
except Exception:
847+
# Fall through to tolerant decode below; don't mask the
848+
# real error with a decoder crash.
849+
pass
850+
if isinstance(raw, (bytes, bytearray)):
851+
body = raw.decode('utf-8', errors='replace')
852+
else:
853+
body = raw
823854

824855
api_response:ScrapeApiResponse = ScrapeApiResponse(
825856
response=response,

0 commit comments

Comments
 (0)