Automatically detects and fixes issues related to ISO 14289-1:2014, Clause 7.21.7 (Unicode character mapping requirements). It ensures that all text content has valid and complete Unicode mapping by repairing missing ToUnicode CMaps and applying OCR-based text reconstruction when Unicode information is absent, guaranteeing reliable text extraction and accessibility compliance.
To use this Docker application, you'll need to have Docker installed on your system. If Docker is not installed, please follow the instructions on the official Docker website to install it.
The first run will pull the docker image, which may take some time. Make your own image for more advanced use.
To run docker container as CLI you should share the folder with PDF to process using -v parameter. In this example it's current folder.
docker run -v $(pwd):/data -w /data --rm pdfix/font-fix-pdfix:latest fix-missing-unicode -i /data/input.pdf -o /data/output.pdfIf you want to use other OCR engine then default Tesseract OCR use parameter --engine with one of values Easy for Easy OCR or Rapid for Rapid OCR.
If you want to fill other then space character when OCR fails to recognize character you can set it using parameter --default_char followed by your desired character.
For more detailed information about the available command-line arguments, you can run the following command:
docker run --rm pdfix/font-fix-pdfix:latest --helpTo export the configuration JSON file, use the following command:
docker run -v $(pwd):/data -w /data --rm pdfix/font-fix-pdfix:latest config -o config.jsonTo report an issue please contact us at support@pdfix.net. For more information visit https://pdfix.net