Backend service for filling AcroForm PDFs from structured user data.
The project favors deterministic behavior first: it normalizes keys, applies stable aliases, coerces values, and only uses optional semantic inference or controlled fallback mapping when explicitly enabled. The result is a small, testable pipeline that is easier to audit than heuristic-only form filling.
- Reads PDF metadata, form fields, and visible page text
- Infers semantic meaning for fields when optional semantic inference is enabled
- Maps user data to fields using deterministic rules first
- Rejects outputs with unresolved required fields
- Returns a new filled PDF through a small FastAPI service
Install development dependencies:
poetry installor
pip install -r requirements-dev.txtRun the API locally:
make run-apiRun the local smoke check:
PYTHONPATH=src python -m scripts.smoke_checkRun the demo workflow against the bundled sample:
PYTHONPATH=src python -m scripts.demo_workflow samples/sample_form.pdfcurl -s -X POST http://localhost:8000/fill \
-F "pdf_file=@samples/sample_form.pdf;type=application/pdf" \
-F 'user_data={"firstname":"Jane","lastname":"Doe","dob":"1990-01-01"}' \
-F "strict=true" \
-o filled.pdfMODEL_PROVIDER_API_KEY: enables semantic inference and fallback mappingAPI_AUTH_ENABLED: enables API key validation onPOST /fillAPI_AUTH_TOKEN: expected token value when auth is enabledAPI_KEY_HEADER: header name used for the incoming tokenMAX_UPLOAD_BYTES: maximum accepted PDF size in bytesLOG_LEVEL: process log level for the API service
Core code lives in src/pdf_autofiller/ and is intentionally split by responsibility:
pdf_reader.py: extraction onlyfield_semantics.py: provider client wrapper and response normalizationmapping.py: deterministic matching and controlled fallback mappingpdf_writer.py: output writing and required-field enforcementapi_service.py: HTTP boundary, auth, request validation, and temp-file lifecycle
The detailed system breakdown is in docs/ARCHITECTURE.md.
ruff,mypy,pip-audit, andpytestare enforced in CI- Coverage floor is
85% - API error responses use stable machine-readable error codes
- Smoke-check and demo scripts are kept separate from the test suite
- The current pipeline targets fillable AcroForm PDFs
- OCR and scanned-document workflows are intentionally out of scope
- Frontend, persistence, and deployment infrastructure are not part of this repository
- If optional provider-backed features are enabled, field metadata and nearby page text may be sent to an external service
docs/API.md: endpoint contracts and example requestsdocs/ARCHITECTURE.md: module boundaries and data flowdocs/OPERATIONS.md: runtime configuration and deployment assumptionsdocs/TESTING.md: local validation workflowdocs/PURPOSE.md: problem statement and intended usageCONTRIBUTING.md: contributor expectationsSECURITY.md: vulnerability reporting and data-handling notes
MIT. See LICENSE.