Parser library for reading and extracting PDF document structures.
If this library helps your analysis pipeline, please consider supporting development via PayPal.
tc-lib-pdf-parser parses raw PDF data into structured PHP arrays suitable for extraction, analysis, and downstream processing.
| Namespace | \Com\Tecnick\Pdf\Parser |
| Author | Nicola Asuni info@tecnick.com |
| License | GNU LGPL v3 - see LICENSE |
| API docs | https://tcpdf.org/docs/srcdoc/tc-lib-pdf-parser |
| Packagist | https://packagist.org/packages/tecnickcom/tc-lib-pdf-parser |
- Cross-reference and object stream parsing
- Filter-aware stream decoding integration
- Structured output suitable for custom extractors
- Configuration options for tolerant parsing modes
- Pure-PHP parser with no external service dependency
- Typed exceptions for error handling
- PHP 8.1 or later
- Extension:
pcre - Composer
composer require tecnickcom/tc-lib-pdf-parser<?php
require_once __DIR__ . '/vendor/autoload.php';
$raw = file_get_contents('/path/to/document.pdf');
$parser = new \Com\Tecnick\Pdf\Parser\Parser(['ignore_filter_errors' => true]);
$data = $parser->parse((string) $raw);
var_dump($data);make deps
make help
make qamake rpm
make debFor system packages, bootstrap with:
require_once '/usr/share/php/Com/Tecnick/Pdf/Parser/autoload.php';Contributions are welcome. Please review CONTRIBUTING.md, CODE_OF_CONDUCT.md, and SECURITY.md.
Nicola Asuni - info@tecnick.com