Commit 0841f61
committed
PDFBOX-5747: Surrogate pairs with combining diacritics are incorrectly ordered on text extraction
- Changed TextPosition.insertDiacritic() to preserve surrogate pairs
- Added unit test
- Included example test PDF file attached to PDFBOX-57471 parent 0b8bc2d commit 0841f61
4 files changed
Lines changed: 18 additions & 3 deletions
File tree
- pdfbox/src
- main/java/org/apache/pdfbox/text
- test/resources/input
Lines changed: 12 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
759 | 759 | | |
760 | 760 | | |
761 | 761 | | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
762 | 767 | | |
763 | 768 | | |
764 | 769 | | |
765 | | - | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
766 | 777 | | |
767 | | - | |
768 | 778 | | |
769 | 779 | | |
770 | 780 | | |
771 | | - | |
772 | 781 | | |
773 | 782 | | |
774 | 783 | | |
| |||
Binary file not shown.
Lines changed: 3 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
Lines changed: 3 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
0 commit comments