Skip to content

Commit c889b1b

Browse files
committed
scrape: proxified_response error restoration from X-Scrapfly-Reject-* headers
When proxified_response=true and the upstream fails, the Go API sets X-Scrapfly-Reject-Code/Description/Retryable + Retry-After headers on the response. The SDK now reads these and raises HttpError with the correct code, http_status_code, is_retryable, and retry_delay instead of returning a raw Response that the caller would have to interpret manually.
1 parent bb8ca09 commit c889b1b

1 file changed

Lines changed: 23 additions & 5 deletions

File tree

scrapfly/client.py

Lines changed: 23 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -495,11 +495,29 @@ def scrape(self, scrape_config:ScrapeConfig, no_raise:bool=False) -> ScrapeApiRe
495495
if scrape_config.proxified_response is True:
496496
# Proxified mode: the API returns the raw upstream response
497497
# (target's status, headers, body) instead of the JSON
498-
# envelope. Skip ScrapeApiResponse parsing entirely and
499-
# return the raw requests.Response so callers can drive
500-
# it like any HTTP response. Scrapfly metadata is on the
501-
# X-Scrapfly-* headers (Content-Format, Log, Api-Cost).
502-
response.raise_for_status()
498+
# envelope. Error restoration: if X-Scrapfly-Reject-Code is
499+
# present, the scrape failed and the SDK must raise a typed
500+
# error with the code/message/retryable from the headers.
501+
reject_code = response.headers.get('X-Scrapfly-Reject-Code')
502+
if reject_code:
503+
from scrapfly.errors import HttpError
504+
reject_desc = response.headers.get('X-Scrapfly-Reject-Description', '')
505+
reject_retryable = response.headers.get('X-Scrapfly-Reject-Retryable', 'false').lower() == 'true'
506+
retry_after = None
507+
if reject_retryable:
508+
try:
509+
retry_after = int(response.headers.get('Retry-After', '0'))
510+
except (ValueError, TypeError):
511+
retry_after = None
512+
raise HttpError(
513+
request=response.request,
514+
response=response,
515+
code=reject_code,
516+
http_status_code=response.status_code,
517+
message=reject_desc,
518+
is_retryable=reject_retryable,
519+
retry_delay=retry_after,
520+
)
503521
self.reporter.report(scrape_api_response=None)
504522
return response
505523

0 commit comments

Comments
 (0)