Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
1ae0081
fix: guard cyclic page tree traversal
vitormattos Apr 24, 2026
e24b1c2
fix: recover repeated page refs in cyclic page trees
vitormattos Apr 25, 2026
0692f90
test(pages): keep cyclic pages regression in PagesTest
vitormattos Apr 25, 2026
e1e08e9
style(tests): fix import order in PagesTest
vitormattos Apr 25, 2026
6815ca8
fix(memory): guard flate decoding and add memory limit helper
vitormattos Apr 26, 2026
b0471aa
test(pages): align cyclic pages expectation with dedup behavior
vitormattos Apr 26, 2026
d22ae73
test(pages): fix PR806 standalone cyclic pages expectation
vitormattos Apr 26, 2026
181268f
test(pages): make cyclic pages assertion merge-safe
vitormattos Apr 26, 2026
1abea5a
test(page): drop PR806 fixture regression from PR814 scope
vitormattos Apr 26, 2026
121b545
test(pages): add fixture source @see for PR806
vitormattos Apr 26, 2026
41c91d8
fix(rawdata): recover malformed xref/startxref scenarios from PR809 s…
vitormattos Apr 26, 2026
6460f6f
fix(rawdata): remove MemoryLimit dependency from PR813 base
vitormattos Apr 26, 2026
5a96c1f
test(rawdata): annotate fixture origins with @see links in PR813
vitormattos Apr 26, 2026
f57f179
style(test): fix @see indentation in rawdata fixture docs
vitormattos Apr 26, 2026
309841d
fix(rawdata): recover malformed xref trailers and page trees
vitormattos Apr 27, 2026
ccff792
fix(rawdata): return early for visited xref offsets
vitormattos Apr 27, 2026
2c69ed9
fix(rawdata): consolidate recovery fixtures and parser tolerance
vitormattos Apr 27, 2026
a7cca15
test(rawdata): move recoverable catalog fixtures out of issue-focus s…
vitormattos Apr 28, 2026
45994db
test(rawdata): add per-fixture @see links in data providers
vitormattos Apr 28, 2026
3da5a3c
test(rawdata): add @see per entry in regression dataprovider
vitormattos Apr 28, 2026
9bd1ec6
test(rawdata): add missing @see to PR regression tests
vitormattos Apr 28, 2026
2412eb7
test(rawdata): use external source PDF links in @see
vitormattos Apr 28, 2026
213c627
test(memory): move large flate regression out of DocumentIssueFocusTest
vitormattos Apr 28, 2026
f22f043
test(memory): restore DocumentIssueFocusTest to master baseline
vitormattos Apr 28, 2026
a27049d
fix(pages): normalize Kids in collectPages traversal
vitormattos Apr 28, 2026
da300de
fix(rawdata): tolerate malformed prev xref chain and add REDHAT regre…
vitormattos Apr 28, 2026
0720cda
chore: drop internal diagnose-parser tool from public PR
vitormattos Apr 28, 2026
a51f137
fix(rawdata-next): align conflict hotspots with integration-resolved …
vitormattos Apr 29, 2026
7b96814
refactor(rawdata-next): delegate shared parser/rawdata test ownership…
vitormattos Apr 29, 2026
ab9a4ef
refactor(rawdata-next): minimize overlap with pages-tree and memory-g…
vitormattos Apr 29, 2026
0600fd2
fix(tests): align rawdata fixture paths in PR816
vitormattos Apr 29, 2026
27e1186
fix(rawdata): restore xref and objref recovery logic for PR816
vitormattos Apr 29, 2026
d0017ca
docs(tests): keep only external PDF @see links in PR816 rawdata tests
vitormattos Apr 29, 2026
e5c71b7
refactor(pr817): isolate non-overlapping fixture scope
vitormattos Apr 29, 2026
77d435b
fix(pages): recover malformed page-like kids
vitormattos Apr 29, 2026
d353915
Assert exact page count in cyclic pages-tree regression
vitormattos Apr 29, 2026
55beec4
Stabilize cyclic pages-tree assertion across runners
vitormattos Apr 29, 2026
8b98061
Fix page-tree traversal to preserve repeated page refs
vitormattos Apr 29, 2026
70e25ba
fix: preserve absolute xref offsets with pre-header bytes
vitormattos Apr 24, 2026
ce9f372
test: use assertCount for page count assertion
vitormattos Apr 24, 2026
c905b7d
fix: recover pages when xref entries are partially missing
vitormattos Apr 25, 2026
d32fa47
fix: recover root object when xref points to invalid offset
vitormattos Apr 25, 2026
12356d2
test: move PR796 regression to RawDataParserTest
vitormattos Apr 25, 2026
213a1ce
fix: allow startxref offset to include leading whitespace
vitormattos Apr 24, 2026
e9d1e8a
test: add pdf.js compressed xref regression
vitormattos Apr 24, 2026
fa5a5ab
test: clarify pull request fixture provenance
vitormattos Apr 24, 2026
f88b639
test(rawdata): keep PR796/797 regressions in RawDataParserTest only
vitormattos Apr 25, 2026
cc54f77
fix(rawdata): recover xref_command_missing in PR796 stack
vitormattos Apr 27, 2026
059c5e9
Harden xref recovery for malformed offsets
vitormattos Apr 29, 2026
19204c5
fix(rawdata): tolerate recoverable headerless inputs
vitormattos Apr 29, 2026
252d0a5
Add ObjStm malformed preamble regression
vitormattos Apr 29, 2026
0e14a26
Normalize bug1978317 fixture permissions
vitormattos Apr 29, 2026
7f4ca6c
Update Parser and ParserTest with ObjStm hardening
vitormattos Apr 29, 2026
8f35f5c
tests: fix duplicate RawDataParser test method names
vitormattos Apr 29, 2026
9c9d2e1
tests: stabilize malformed objstm preamble assertion
vitormattos Apr 29, 2026
b035cfa
fix(pages): deduplicate Kids refs and guard cyclic page tree
vitormattos Apr 29, 2026
a90b705
feat(document): integrate PR806 page-recovery fallback methods
vitormattos Apr 29, 2026
7a85161
fix(font): guard uchr() against out-of-range numeric codes (PHP 8.5)
vitormattos Apr 29, 2026
ffd7d46
fix(parser): add xref and object stream parsing hardening with recovery
vitormattos Apr 30, 2026
918fa2a
fix(document,pages): align page tree traversal with cycle-aware dedup…
vitormattos Apr 30, 2026
f443f9d
Merge pull request #42 from vitormattos/fix/rawdata-memory-guard
vitormattos Apr 30, 2026
370b4f3
Merge remote-tracking branch 'origin/fix/rawdata-next-xref-trailer-re…
vitormattos Apr 30, 2026
1c38ec7
merge(#806): absorb branch history into PR795 keeping consolidated im…
vitormattos Apr 30, 2026
c83e535
merge(#817): non-encryption divergence fixes into PR795
vitormattos Apr 30, 2026
94503c1
fix(pages): preserve declared Count fallback under document-level dedup
vitormattos Apr 30, 2026
3adadd2
test(regression): cover all added PDF fixtures page counts
vitormattos Apr 30, 2026
6bc1286
test: fix docblock indentation for @see annotations
vitormattos Apr 30, 2026
a506d46
test: fix docblock and syntax errors in RawDataParserTest
vitormattos Apr 30, 2026
ab36499
test: remove artificial fallback count test
vitormattos Apr 30, 2026
9882af0
test: remove generic AddedPdfRegressionTest in favor of provenance-ba…
vitormattos Apr 30, 2026
b61d053
feat: add inline Kids fallback and generation-number normalisation
vitormattos Apr 30, 2026
b9fffed
fix: inherit MediaBox from parent Pages nodes; fall back to US Letter
vitormattos Apr 30, 2026
d57e07a
test: tighten MediaBox assertions; promote fixtures out of pdfjs-corr…
vitormattos Apr 30, 2026
5929761
Fix page box resolution across object stream revisions
vitormattos May 3, 2026
d2a50aa
Use versioned rawdata fixtures for PageTest pdf.js cases
vitormattos May 3, 2026
06fb8ee
Fix php-cs-fixer violations in Page
vitormattos May 3, 2026
30303a1
Handle readable encrypted PDFs and large stream decode limits
vitormattos May 3, 2026
922f62f
test: add regression test coverage for readable encrypted and large s…
vitormattos May 3, 2026
3d133e7
refactor: use @dataProvider for pdf.js regression tests and add page …
vitormattos May 3, 2026
dd28807
Fix page size fallback for fuzzed box coordinates
vitormattos May 3, 2026
225faa3
Reclassify poppler-85140 regression as regular fixture
vitormattos May 3, 2026
866070a
feat(tests): add regression tests for 4 pdf.js corpus files with xref…
vitormattos May 3, 2026
8d2951b
docs(tests): fix misleading comment on bug1980958 — parser extracts r…
vitormattos May 3, 2026
7a3a44a
test(fixtures): add poppler-91414-0-54 regression test from pdf.js co…
vitormattos May 3, 2026
d87cefa
test(fixtures): add 2 more pdf.js regression tests (PDFBOX-4352-0, po…
vitormattos May 3, 2026
a873047
feat(page): add native page dimensions API
vitormattos May 3, 2026
42f34b6
docs(usage): simplify page dimensions example
vitormattos May 3, 2026
c8c25fa
fix(page): normalize inverted box coordinates and cache dimensions
vitormattos May 3, 2026
07e3e91
refactor(document): harden unresolvable pages fallback
vitormattos May 3, 2026
c3635b0
fix(parser): tighten encrypted/xref recovery heuristics
vitormattos May 3, 2026
c18ad2c
refactor(parser): avoid broad Throwable catches in readability checks
vitormattos May 3, 2026
a123dea
refactor: replace broad Throwable catches in PR paths
vitormattos May 3, 2026
4c10848
refactor(tests): use native page dimensions API in helper
vitormattos May 3, 2026
602d790
refactor(tests): inline native page dimension retrieval
vitormattos May 3, 2026
aad87f3
docs: clarify encrypted-but-readable PDF wording
vitormattos May 3, 2026
60ad627
refactor(tests): simplify native page dimensions assertions
vitormattos May 3, 2026
efaf0d9
refactor(tests): rely on native getDimensions default fallback
vitormattos May 3, 2026
84b6277
chore: remove unecessary line break
vitormattos May 3, 2026
685474b
fix(pdfobject): enforce non-nullable config property
vitormattos May 3, 2026
cc1c936
fix(pdfobject): clean recursion stack after text extraction
vitormattos May 4, 2026
eda10e9
ci: cancel older in-progress runs on same PR
vitormattos May 4, 2026
445017b
fix(pdfobject): restore stable recursion and config behavior
vitormattos May 4, 2026
8021276
refactor(page): deduplicate box coordinate extraction
vitormattos May 4, 2026
d6bede9
refactor(document): extract ordered page resolvers in getPages
vitormattos May 4, 2026
875051e
refactor(document): restore explicit getPages flow
vitormattos May 4, 2026
bd27e75
refactor(document): collapse repetitive fallback dispatch in getPages
vitormattos May 4, 2026
275994e
refactor(document): use callable array for lazy fallback dispatch in …
vitormattos May 5, 2026
a397794
refactor(document): use lazy closures for page fallback resolution
vitormattos May 5, 2026
55dfe59
refactor(document): consolidate duplicate fallback guards into inline…
vitormattos May 5, 2026
a4b5992
Revert "refactor(document): consolidate duplicate fallback guards int…
vitormattos May 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Auto detect text files and perform LF normalization
* text=auto

# Treat PDF files as binary to prevent CRLF conversion on Windows
*.pdf binary

/.editorconfig export-ignore
/.gitattributes export-ignore
/.gitignore export-ignore
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/coding-standards.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ on:
branches:
- master

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
coding-standards:
name: "CS Fixer & PHPStan"
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/continuous-integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@ name: "CI"

on: [push, pull_request]

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
phpunit:
name: "PHPUnit (PHP ${{ matrix.php }})"
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/performance.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ on:
branches:
- "master"

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

env:
fail-fast: true

Expand Down
26 changes: 9 additions & 17 deletions doc/Usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,30 +219,22 @@ Ref: [#472](https://github.com/smalot/pdfparser/issues/427#issuecomment-97341678
```php
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile('document.pdf');
$pages = $pdf->getPages();
// this variable will contain the height and width of each page of the given PDF
$mediaBox = [];
foreach ($pages as $page) {
$details = $page->getDetails();
// If Mediabox is not set in details of current $page instance, get details from the header instead
if (!isset($details['MediaBox'])) {
$pages = $pdf->getObjectsByType('Pages');
$details = reset($pages)->getHeader()->getDetails();
}
$mediaBox[] = [
'width' => $details['MediaBox'][2],
'height' => $details['MediaBox'][3]
];
}
// Width/height per page (points), using CropBox with MediaBox fallback.
$dimensions = $pdf->getPagesDimensions();

// To force MediaBox explicitly:
$mediaBoxDimensions = $pdf->getPagesDimensions('MediaBox');
```

## PDF encryption

This library cannot currently read encrypted PDF files, i.e. those with
a read password. Attempting to do so produces this error:
This library does not currently support decrypting PDFs that require an explicit
user password. Attempting to read such files may produce this error:
```
Exception: Secured pdf file are currently not supported.
```

Some PDFs are flagged as encrypted but remain readable without an explicit user password.

See `setIgnoreEncryption` option in [CustomConfig.md](CustomConfig.md)
for how to override the check in specific cases.
Binary file added samples/bugs/Brotli-Prototype-FileA.pdf
Binary file not shown.
Binary file added samples/bugs/PDFBOX-4352-0.pdf
Binary file not shown.
Binary file added samples/bugs/PullRequest797-pdf.js.pdf
Binary file not shown.
Binary file added samples/bugs/PullRequest797-vera.pdf
Binary file not shown.
Binary file added samples/bugs/PullRequest806-pdf.js.pdf
Binary file not shown.
Binary file added samples/bugs/PullRequest812-issue7229.pdf
Binary file not shown.
Binary file added samples/bugs/PullRequest813-pdf.js.pdf
Binary file not shown.
Binary file added samples/bugs/PullRequest814-pdf.js.pdf
Binary file not shown.
Binary file added samples/bugs/PullRequest815-xref-command-missing.pdf
Binary file not shown.
Binary file added samples/bugs/PullRequestDuplicateKids.pdf
Binary file not shown.
Binary file not shown.
Binary file added samples/bugs/REDHAT-1531897-0.pdf
Binary file not shown.
Binary file added samples/bugs/bug1978317.pdf
Binary file not shown.
Binary file added samples/bugs/bug1980958.pdf
Binary file not shown.
Binary file added samples/bugs/issue15590.pdf
Binary file not shown.
Binary file added samples/bugs/issue18986.pdf
Binary file not shown.
Binary file added samples/bugs/issue9105_other.pdf
Binary file not shown.
Binary file added samples/bugs/poppler-395-0-fuzzed.pdf
Binary file not shown.
Binary file added samples/bugs/poppler-67295-0.pdf
Binary file not shown.
Binary file added samples/bugs/poppler-85140-0.pdf
Binary file not shown.
Binary file added samples/bugs/poppler-91414-0-53.pdf
Binary file not shown.
Binary file added samples/bugs/poppler-91414-0-54.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/Pages-tree-refs.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/PullRequest794.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/PullRequest797-pdf.js.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/PullRequest797-vera.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/PullRequest804-pdf.js.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/PullRequest805-pdf.js.pdf
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added samples/bugs/rawdata/PullRequest809-pdf.js.pdf
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added samples/bugs/rawdata/PullRequest812-pdf.js.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/PullRequest813-pdf.js.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/PullRequest814-pdf.js.pdf
Binary file not shown.
Binary file not shown.
Binary file added samples/bugs/rawdata/PullRequest818-pdf.js.pdf
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added samples/bugs/rawdata/boundingBox_invalid.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/bug1250079.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/bug1539074.1.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/bug1539074.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/bug1606566.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/bug1795263.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/copy_paste_ligatures.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/issue16091.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/issue19484_1.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/issue19484_2.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/issue7872.pdf
Binary file not shown.
Binary file not shown.
Binary file added samples/bugs/rawdata/pdfjs-issue19517.pdf
Binary file not shown.
Binary file added samples/bugs/rawdata/poppler-742-0-fuzzed.pdf
Binary file not shown.
Loading
Loading