PDF Autofiller

Backend service for filling AcroForm PDFs from structured user data.

The project favors deterministic behavior first: it normalizes keys, applies stable aliases, coerces values, and only uses optional semantic inference or controlled fallback mapping when explicitly enabled. The result is a small, testable pipeline that is easier to audit than heuristic-only form filling.

What It Does

Reads PDF metadata, form fields, and visible page text
Infers semantic meaning for fields when optional semantic inference is enabled
Maps user data to fields using deterministic rules first
Rejects outputs with unresolved required fields
Returns a new filled PDF through a small FastAPI service

Quick Start

Install development dependencies:

poetry install

or

pip install -r requirements-dev.txt

Run the API locally:

make run-api

Run the local smoke check:

PYTHONPATH=src python -m scripts.smoke_check

Run the demo workflow against the bundled sample:

PYTHONPATH=src python -m scripts.demo_workflow samples/sample_form.pdf

API Example

curl -s -X POST http://localhost:8000/fill \
  -F "pdf_file=@samples/sample_form.pdf;type=application/pdf" \
  -F 'user_data={"firstname":"Jane","lastname":"Doe","dob":"1990-01-01"}' \
  -F "strict=true" \
  -o filled.pdf

Configuration

MODEL_PROVIDER_API_KEY: enables semantic inference and fallback mapping
API_AUTH_ENABLED: enables API key validation on POST /fill
API_AUTH_TOKEN: expected token value when auth is enabled
API_KEY_HEADER: header name used for the incoming token
MAX_UPLOAD_BYTES: maximum accepted PDF size in bytes
LOG_LEVEL: process log level for the API service

Architecture

Core code lives in src/pdf_autofiller/ and is intentionally split by responsibility:

pdf_reader.py: extraction only
field_semantics.py: provider client wrapper and response normalization
mapping.py: deterministic matching and controlled fallback mapping
pdf_writer.py: output writing and required-field enforcement
api_service.py: HTTP boundary, auth, request validation, and temp-file lifecycle

The detailed system breakdown is in docs/ARCHITECTURE.md.

Quality

ruff, mypy, pip-audit, and pytest are enforced in CI
Coverage floor is 85%
API error responses use stable machine-readable error codes
Smoke-check and demo scripts are kept separate from the test suite

Scope

The current pipeline targets fillable AcroForm PDFs
OCR and scanned-document workflows are intentionally out of scope
Frontend, persistence, and deployment infrastructure are not part of this repository
If optional provider-backed features are enabled, field metadata and nearby page text may be sent to an external service

Documentation

docs/API.md: endpoint contracts and example requests
docs/ARCHITECTURE.md: module boundaries and data flow
docs/OPERATIONS.md: runtime configuration and deployment assumptions
docs/TESTING.md: local validation workflow
docs/PURPOSE.md: problem statement and intended usage
CONTRIBUTING.md: contributor expectations
SECURITY.md: vulnerability reporting and data-handling notes

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
docs		docs
samples		samples
scripts		scripts
src/pdf_autofiller		src/pdf_autofiller
tests		tests
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Autofiller

What It Does

Quick Start

API Example

Configuration

Architecture

Quality

Scope

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF Autofiller

What It Does

Quick Start

API Example

Configuration

Architecture

Quality

Scope

Documentation

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages