Skip to content

Form Recognizer API with Python for PDF throwing "BadArgument" error #323

@scoopmans

Description

@scoopmans

The issue

The Quickstart for Form Recognizer REST API with Python is not functioning any more. It throws the following error:
POST analyze failed: {"error":{"code":"BadArgument","message":"Bad or unrecognizable request JSON or binary file."}}

Not sure if this has to do with the update to Form Recognizer REST API v3.0, but is not functioning for PDF anymore. It worked a few weeks ago.

My code

from requests import get, post, Request, Session

endpoint = r"https://westeurope.api.cognitive.microsoft.com"
key = "<my_personal_key"
source = r"<path to pdf>"

def get_url(filename, endpoint, apim_key):

    # v2.1
    post_url = endpoint + "/formrecognizer/v2.1/layout/analyze"

    headers = {
        # Request headers
        'Content-Type': 'application/pdf',
        'Ocp-Apim-Subscription-Key': apim_key,
    }

    with open(filename, "rb") as f:
        data_bytes = f.read()

    try:
        resp = post(post_url, data=data_bytes, headers=headers)
        if resp.status_code != 202:
            print("POST analyze failed:\n%s" % resp.text)
            return None
        print("POST analyze succeeded: %s" % resp.headers["operation-location"])
        get_url = resp.headers["operation-location"]
        return get_url
    except Exception as e:
        print("POST request failed:\n%s" % str(e))
        return None

resp_url = get_url(source, endpoint, key)

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [x] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Follow the Quickstart: Extract text and layout information using the Form Recognizer REST API with Python

Any log messages given by the failure

POST analyze failed:
{"error":{"code":"BadArgument","message":"Bad or unrecognizable request JSON or binary file."}}

Expected/desired behavior

I would expect to get a url returned

OS and Version?

Working on Windows 10, using Visual Studio Code

Versions

requests == 2.28.1
Python == 3.9.13

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions