Fix this off-by-one error #1

speedplane · 2017-03-07T08:03:26Z

ASCII 85 can represent 2^32-1. This error was causing validations to break.

@sylvainpelissier

Adapted from work by Sylvain Pelissier (@sylvainpelissier) http://stackoverflow.com/questions/2693820/extract-images-from-pdf-without-resampling-in-python Script works but has limited range of image types it is successful with. Future commits will have sample PDFs and notes about what works/fails.

``` > python pdf-image-extractor.py ..\PDF_Samples\GeoBase_NHNC1_Data_Model_UML_EN.pdf Traceback (most recent call last): File "pdf-image-extractor.py", line 33, in <module> img = Image.frombytes(mode, size, data) File "C:\Python27\ArcGIS10.3\lib\site-packages\PIL\Image.py", line 2047, in frombytes im.frombytes(data, decoder_name, args) File "C:\Python27\ArcGIS10.3\lib\site-packages\PIL\Image.py", line 731, in frombytes raise ValueError("not enough image data") ValueError: not enough image data ``` Source: http://ftp2.cits.rncan.gc.ca/pub/geobase/official/nhn_rhn/doc/ """ All distributed data are subject to the Open Government Licence – Canada. Canada grants to the licensee a non-exclusive, fully paid, royalty-free right and licence to exercise all intellectual property rights in the data. This includes the right to use, incorporate, sublicense (with further right of sublicensing), modify, improve, further develop, and distribute the Data; and to manufacture or distribute derivative products. -- http://www.nrcan.gc.ca/earth-sciences/geography/topographic-information/free-data-geogratis/licence/17285 """

Image extractor script with sample failing pdf

Travis CI picture.

…EIQFeature-1 Fix a bug in _readInlineImage

…bject, resolves py-pdf#263

Uses same structure as addLink addURI

…nd_paeth_filter Add support for PNG filters average and paeth

…into JohnMulligan-URI-linking

Prevent infinite loop in readObject() function

Changes readStringFromStream to use a dict of escapes rather than a long if/else chain. (should lead to speed up, and looks cleaner)

The previous check was always evaluated to False on Python 3, so I replaced it with a duck-typing one compatible with both Python versions.

* Explicitly export PdfFileReader, PdfFileWriter * Implicit string concatenation * Don't leave open file handles * Apply hints from flake8-simplify * Only import stuff that is used

Signed-off-by: Matthew Peveler <matt.peveler@gmail.com>

* Replace pytest-cov by coverage * Fix coverage badge

Adding unit Tests: * xmp * ConvertFunctionsToVirtualList * PyPDF2.utils.hexStr * Page operations with encoded file * merging encrypted * images DOC: Comments to docstrings STY: Remove vim comments BUG: CCITTFaxDecode decodeParms can be an ArrayObject. I don't know how a good solution would look like. Now it doesn't throw an error, but the result might be wrong. BUG: struct was not imported for Python 2.X

Closes py-pdf#511

Credits to Sebastian Krause for creating the PDF: py-pdf#331 (comment) Co-authored-by: Sebastian Krause <sebastian@realpath.org>

Closes py-pdf#329 - potential infinite loop (SEC) Closes py-pdf#330 - performance issue of ContentStream._readInlineImage (PERF)

Security (SEC): - ContentStream_readInlineImage had potential infinite loop (py-pdf#740) Bug fixes (BUG): - Fix merging encrypted files (py-pdf#757) - CCITTFaxDecode decodeParms can be an ArrayObject (py-pdf#756) Robustness improvements (ROBUST): - title sometimes None (py-pdf#744) Documentation (DOC): - Adjust short description of the package Tests and Test setup (TST): - Rewrite JS tests from unittest to pytest (py-pdf#746) - Increase Test coverage, mainly with filters (py-pdf#756) - Add test for inline images (py-pdf#758) Developer Experience Improvements (DEV): - Remove unused Travis-CI configuration (py-pdf#747) - Show code coverage (py-pdf#754, py-pdf#755) - Add mutmut (py-pdf#760) Miscellaneous: - STY: Closing file handles, explicit exports, ... (py-pdf#743) All changes: py-pdf/pypdf@1.27.4...1.27.5

ISSUE: Problem appears because _flatten() method sets self.flattenedPages before it tries to get pages and doesn't set it back to None in case of error. This PR just makes _flatten() to set self.flattenedPages to an empty array after it successfully got pages. FIX: Call `self.flattenedPages` after calling `catalog["/Pages"].getObject()` Closes py-pdf#327

Credits to Denis Osipov: py-pdf#359 (comment) Co-authored-by: Denis Osipov <osipov_d@list.ru>

) The header being read has the format: <idnum> <generation> obj where `<idnum>` and `<generation>` are integers. Previously an arbitrary number of spaces was being allowed between `<idnum>` and `<generation>`, but not between `<generation>` and `obj`. We now allow arbitrary spaces between `<generation>` and `obj`.

See py-pdf#107

This allows us to leverage the IDE. * Documentation: We can now document what the constants are good for and give background information around them * Homographs: We can distinguish literals which have the same name, but different contexts * Typos: We can hopefully avoid typos like decodeParams -> decodeParms. For users of PyPDF2, this doesn't change anything. We still use string literals. For documentation we should also keep doing that.

…ored (py-pdf#240) Closes py-pdf#163

This helps users who run into issue py-pdf#67

Fixes bug where decodeParms.get(...) causes AttributeError: 'ArrayObject' object has no attribute 'get' Closes py-pdf#404

Added optional parameter in readNextEndLine() to limit the offset then read() uses this parameter to limit the reading to last1K Closes py-pdf#639 Closes py-pdf#439

Henri Salo and others added 30 commits August 18, 2015 13:42

Prevent infinite loop in readObject() function. Patch by dhudson1. Cl…

4819397

…oses py-pdf#184

/DCTDecode stream data

098394a

JPEG sample

7b591a2

JPEG 2000 filter added

39de327

Merge pull request #1 from maphew/master

c83cbd8

Image extractor script with sample failing pdf

PDF extraction error handling

7bc62cd

Testing

19a8872

Update README.md

efae6bc

Travis CI picture.

Add CCITTFax Decode and JPEG test

1273824

Correct test for python3

b0ace62

Merge pull request py-pdf#261 from speedplane/feature/readInlineImage…

78fd8c6

…EIQFeature-1 Fix a bug in _readInlineImage

Version 1.26.0 update

5735cb7

Read Indirect Objects with a sign, fixes py-pdf#248

b030b7f

Appropriate error message for closed file, warn when returning null o…

26e5077

…bject, resolves py-pdf#263

Python 3 type fixes in LZWDecode

5bbd5af

Added URI linking

ce5f7ec

Uses same structure as addLink addURI

Write binary data comment

036789a

Add support for PNG filters average and paeth

60dff8d

Fix filter type 3 and 4 byte range

60abb83

Merge pull request py-pdf#283 from manuelzs/support_for_png_average_a…

6f284de

…nd_paeth_filter Add support for PNG filters average and paeth

Merge branch 'URI-linking' of https://github.com/JohnMulligan/PyPDF2 …

0208955

…into JohnMulligan-URI-linking

Fixed TabError in Py3

ad90b69

Merge branch 'JohnMulligan-URI-linking'

fe934cc

Merge pull request py-pdf#223 from fgeek/fix-dos-issue

4fc7f9d

Prevent infinite loop in readObject() function

speed up escape sequences

d7f5eaf

Changes readStringFromStream to use a dict of escapes rather than a long if/else chain. (should lead to speed up, and looks cleaner)

README.md: fix sample code directory name

e9d0b86

Fix PdfFileMerger for file objects on Python 3.

8ba44f2

The previous check was always evaluated to False on Python 3, so I replaced it with a duck-typing one compatible with both Python versions.

Correct name error

77629e6

MartinThoma and others added 30 commits April 12, 2022 22:35

STY: Fix various style issues (py-pdf#742)

1f00794

REL: 1.27.4

e45e66b

STY: Fix style issues (py-pdf#743)

38d5ec4

* Explicitly export PdfFileReader, PdfFileWriter * Implicit string concatenation * Don't leave open file handles * Apply hints from flake8-simplify * Only import stuff that is used

DEV: Remove unused Travis-CI configuration (py-pdf#747)

7771fad

Signed-off-by: Matthew Peveler <matt.peveler@gmail.com>

TST: Rewrite JS tests from unittest to pytest (py-pdf#746)

0ea2301

Signed-off-by: Matthew Peveler <matt.peveler@gmail.com>

DOC: Adjust short description

0500c8d

DEV: Show code coverage (py-pdf#754)

01a1242

Combine coverage (py-pdf#755)

fe45d2e

* Replace pytest-cov by coverage * Fix coverage badge

BUG: Fix merging encrypted files (py-pdf#757)

9d53ee8

ROBUST: title sometimes None (py-pdf#744)

29194cd

Closes py-pdf#511

TST: Add test for inline images (py-pdf#758)

0890b06

Credits to Sebastian Krause for creating the PDF: py-pdf#331 (comment) Co-authored-by: Sebastian Krause <sebastian@realpath.org>

SEC/PERF: ContentStream_readInlineImage (py-pdf#740)

d71fb3e

Closes py-pdf#329 - potential infinite loop (SEC) Closes py-pdf#330 - performance issue of ContentStream._readInlineImage (PERF)

TST: Check for metadata

eda50ac

DEV: Add mutmut (py-pdf#760)

8aa440c

TST: Regression test for py-pdf#327

d58a849

Credits to Denis Osipov: py-pdf#359 (comment) Co-authored-by: Denis Osipov <osipov_d@list.ru>

STY: Make variable naming more consistent in tests

a5875c5

DOC: Working with annotations (py-pdf#764)

87aafd6

See py-pdf#107

BUG: Clip by trimBox when merging pages, which would otherwise be ign…

c138f21

…ored (py-pdf#240) Closes py-pdf#163

BUG: Add overwriteWarnings parameter PdfFileMerger (py-pdf#243)

bf3c9c9

This helps users who run into issue py-pdf#67

DOCS: Structure history

4a3af96

DEV: Add issue templates (py-pdf#765)

5e4fdfa

BUG: Handle cases where decodeParms is an ArrayObject (py-pdf#405)

ba7ee5b

Fixes bug where decodeParms.get(...) causes AttributeError: 'ArrayObject' object has no attribute 'get' Closes py-pdf#404

BUG: Fix reading more than last1K for EOF (py-pdf#642)

03ea3ec

Added optional parameter in readNextEndLine() to limit the offset then read() uses this parameter to limit the reading to last1K Closes py-pdf#639 Closes py-pdf#439

DOC: Link to pdftoc in Sample_Code (py-pdf#628)

89bc093

Merge branch 'main' into feature/ASCII85-Off-By-One

a6ddce0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix this off-by-one error #1

Fix this off-by-one error #1

Uh oh!

speedplane commented Mar 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Fix this off-by-one error #1

Are you sure you want to change the base?

Fix this off-by-one error #1

Uh oh!

Conversation

speedplane commented Mar 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants