Skip to content

tecnickcom/tc-lib-pdf-parser

tc-lib-pdf-parser

Parser library for reading and extracting PDF document structures.

Latest Stable Version Build Coverage License Downloads

Donate via PayPal

If this library helps your analysis pipeline, please consider supporting development via PayPal.


Overview

tc-lib-pdf-parser parses raw PDF data into structured PHP arrays suitable for extraction, analysis, and downstream processing.

Namespace \Com\Tecnick\Pdf\Parser
Author Nicola Asuni info@tecnick.com
License GNU LGPL v3 - see LICENSE
API docs https://tcpdf.org/docs/srcdoc/tc-lib-pdf-parser
Packagist https://packagist.org/packages/tecnickcom/tc-lib-pdf-parser

Features

Parsing Capabilities

  • Cross-reference and object stream parsing
  • Filter-aware stream decoding integration
  • Structured output suitable for custom extractors

Runtime Design

  • Configuration options for tolerant parsing modes
  • Pure-PHP parser with no external service dependency
  • Typed exceptions for error handling

Requirements

  • PHP 8.1 or later
  • Extension: pcre
  • Composer

Installation

composer require tecnickcom/tc-lib-pdf-parser

Quick Start

<?php

require_once __DIR__ . '/vendor/autoload.php';

$raw = file_get_contents('/path/to/document.pdf');
$parser = new \Com\Tecnick\Pdf\Parser\Parser(['ignore_filter_errors' => true]);
$data = $parser->parse((string) $raw);

var_dump($data);

Development

make deps
make help
make qa

Packaging

make rpm
make deb

For system packages, bootstrap with:

require_once '/usr/share/php/Com/Tecnick/Pdf/Parser/autoload.php';

Contributing

Contributions are welcome. Please review CONTRIBUTING.md, CODE_OF_CONDUCT.md, and SECURITY.md.


Contact

Nicola Asuni - info@tecnick.com

About

PHP library to parse PDF documents

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors