Version 1.26.0, 2016-05-18
- NOTE: Active maintenance on PyPDF2 is resuming after a hiatus
- Fixed a bug where image resources where incorrectly
overwritten when merging pages
- Added dictionary for JavaScript actions to the root (louib)
- Added unit tests for the JS functionality (louib)
- Add more Python 3 compatibility when reading inline images (im2703
and (VyacheslavHashov)
- Return NullObject instead of raising error when failing to resolve
object (ctate)
- Don't output warning for non-zeroed xref table when strict=False
- Remove extraneous zeroes from output formatting (speedplane)
- Fix bug where reading an inline image would cut off prematurely
in certain cases (speedplane)
Patch 1.25.1, 2015-07-20
- Fix bug when parsing inline images. Occurred when merging
certain pages with inline images
- Fixed type error when creating outlines by utilizing the
isString() test
Version 1.25, 2015-07-07
- Added Python 3 algorithm for ASCII85Decode. Fixes issue when
reading reportlab-generated files with Py 3 (jerickbixly)
- Recognize more escape sequence which would otherwise throw an
exception (manuelzs, robertsoakes)
- Fixed overflow error in Occurred
when reading a too-large int in Python 2 (by Raja Jamwal)
- Allow access to files which were encrypted with an empty
password. Previously threw a "File has not been decrypted"
exception (Elena Williams)
- Do not attempt to decode an empty data stream. Previously
would cause an error in decode algorithms (vladir)
- Fixed some type issues specific to Py 2 or Py 3
- Fix issue when stream data begins with whitespace (soloma83)
- Recognize abbreviated filter names (AlmightyOatmeal and
Matthew Weiss)
- Copy decryption key from PdfFileReader to PdfFileMerger.
Allows usage of PdfFileMerger with encrypted files (twolfson)
- Fixed bug which occurred when a NameObject is present at end
of a file stream. Threw a "Stream has ended unexpectedly"
exception (speedplane)
- Initial work on a test suite; to be expanded in future.
Tests and Resources directory added, README updated (robertsoakes)
- Added document cloning methods to PdfFileWriter:
appendPagesFromReader, cloneReaderDocumentRoot, and
cloneDocumentFromReader. See official documentation (robertsoakes)
- Added method for writing to form fields: updatePageFormFieldValues.
This will be enhanced in the future. See official documentation
- New addAttachment method. See documentation. Support for adding
and extracting embedded files to be enhanced in the future
- Added methods to get page number of given PageObject or
Destination: getPageNumber and getDestinationPageNumber.
See documentation (mozbugbox)
- Enhanced type handling (Brent Amrhein)
- Enhanced exception handling in NameObject (sbywater)
- Enhanced extractText method output (peircej)
- Better exception handling
- Enhanced regex usage in NameObject class (speedplane)
Version 1.24, 2014-12-31
- Bugfixes for reading files in Python 3 (by Anthony Tuininga and
- Appropriate errors are now raised instead of infinite loops (by
naure and Cyrus Vafadari)
- Bugfix for parsing number tokens with leading spaces (by Maxim
- Don't crash on bad /Outlines reference (by eshellman)
- Conform tabs/spaces and blank lines to PEP 8 standards
- Utilize the readUntilRegex method when reading Number Objects
(by Brendan Jurd)
- More bugfixes for Python 3 and clearer exception handling
- Fixed encoding issue in merger (with eshellman)
- Created separate folder for scripts
Version 1.23, 2014-08-11
- Documentation now available at
- Bugfix in for when __init__.__doc__ has no value (by
Vladir Cruz)
- Fix typos in OutlinesObject().add() (by shilluc)
- Re-added a missing return statement in a method
- Corrected viewing mode names (by Jason Scheirer)
- New PdfFileWriter method: addJS() (by vfigueiro)
- New bookmark features: color, boldness, italics, and page fit
(by Joshua Arnott)
- New PdfFileReader method: getFields(). Used to extract field
information from PDFs with interactive forms. See documentation
for details
- Converted README file to markdown format (by Stephen Bussard)
- Several improvements to overall performance and efficiency
(by mozbugbox)
- Fixed a bug where geospatial information was not scaling along with
its page
- Fixed a type issue and a Python 3 issue in the decryption algorithms
(with Francisco Vieira and koba-ninkigumi)
- Fixed a bug causing an infinite loop in the ASCII 85 decoding
algorithm (by madmaardigan)
- Annotations (links, comment windows, etc.) are now preserved when
pages are merged together
- Used the Destination class in addLink() and addBookmark() so that
the page fit option could be properly customized
Version 1.22, 2014-05-29
- Added .DS_Store to .gitignore (for Mac users) (by Steve Witham)
- Removed __init__() implementation in NameObject (by Steve Witham)
- Fixed bug (inf. loop) when merging pages in Python 3 (by commx)
- Corrected error when calculating height in scaleTo()
- Removed unnecessary code from DictionaryObject (by Georges Dubus)
- Fixed bug where an exception was thrown upon reading a NULL string
(by speedplane)
- Allow string literals (non-unicode strings in Python 2) to be passed
to PdfFileReader
- Allow ConvertFunctionsToVirtualList to be indexed with slices and
longs (in Python 2) (by Matt Gilson)
- Major improvements and bugfixes to addLink() method (see documentation
in source code) (by Henry Keiter)
- General code clean-up and improvements (with Steve Witham and Henry Keiter)
- Fixed bug that caused crash when comments are present at end of
Version 1.21, 2014-04-21
- Fix for when /Type isn't present in the Pages dictionary (by Rob1080)
- More tolerance for extra whitespace in Indirect Objects
- Improved Exception handling
- Fixed error in getHeight() method (by Simon Kaempflein)
- implement use of utils.string_type to resolve Py2-3 compatibility issues
- Prevent exception for multiple definitions in a dictionary (with carlosfunk)
(only when strict = False)
- Fixed errors when parsing a slice using pdfcat on command line (by
Steve Witham)
- Tolerance for EOF markers within 1024 bytes of the actual end of the
file (with David Wolever)
- Added overwriteWarnings parameter to PdfFileReader constructor, if False
PyPDF2 will NOT overwrite methods from Python's module with
a custom implementation.
- Fix NumberObject and NameObject constructors for compatibility with PyPy
(Rüdiger Jungbeck, Xavier Dupré, shezadkhan137, Steven Witham)
- Utilize utils.Str in and to resolve type issues (by
- Improvements in implementing StringIO for Python 2 and BytesIO for
Python 3 (by Xavier Dupré)
- Added /x00 to Whitespaces, defined utils.WHITESPACES to clarify code (by
Maxim Kamenkov)
- Bugfix for merging 3 or more resources with the same name (by lucky-user)
- Improvements to Xref parsing algorithm (by speedplane)
Version 1.20, 2014-01-27
- Official Python 3+ support (with contributions from TWAC and cgammans)
Support for Python versions 2.6 and 2.7 will be maintained
- Command line concatenation (see pdfcat in sample code) (by Steve Witham)
- New FAQ; link included in README
- Allow more (although unnecessary) escape sequences
- Prevent exception when reading a null object in decoding parameters
- Corrected error in reading destination types (added a slash since they
are name objects)
- Corrected TypeError in scaleTo() method
- addBookmark() method in PdfFileMerger now returns bookmark (so nested
bookmarks can be created)
- Additions to Sample Code and Sample PDFs
- changes to allow 2up script to work (see sample code) (by Dylan McNamee)
- changes to metadata encoding (by Chris Hiestand)
- New methods for links: addLink() (by Enrico Lambertini) and removeLinks()
- Bugfix to handle nested bookmarks correctly (by Jamie Lentin)
- New methods removeImages() and removeText() available for PdfFileWriter
(by Tien Haï)
- Exception handling for illegal characters in Name Objects
Version 1.19, 2013-10-08
- Removed pop in sweepIndirectReferences to prevent infinite loop
(provided by ian-su-sirca)
- Fixed bug caused by whitespace when parsing PDFs generated by AutoCad
- Fixed a bug caused by reading a 'null' ASCII value in a dictionary
object (primarily in PDFs generated by AutoCad).
- Added new folders for PyPDF2 sample code and example PDFs; see README
for each folder
- Added a method for debugging purposes to show current location while
- Ability to create custom metadata (by jamma313)
- Ability to access and customize document layout and view mode
(by Joshua Arnott)
- Added and corrected some documentation
- Added some more warnings and exception messages
- Removed old test/debugging code
- More bugfixes (We have received many problematic PDFs via email, we
will work with them)
- Documentation - It's time for PyPDF2 to get its own documentation
since it has grown much since the original pyPdf
- A FAQ to answer common questions
Version 1.18, 2013-08-19
- Fixed a bug where older verions of objects were incorrectly added to the
cache, resulting in outdated or missing pages, images, and other objects
(from speedplane)
- Fixed a bug in parsing the xref table where new xref values were
overwritten; also cleaned up code (from speedplane)
- New method mergeRotatedAroundPointPage which merges a page while rotating
it around a point (from speedplane)
- Updated Destination syntax to respect PDF 1.6 specifications (from
- Prevented infinite loop when a PdfFileReader object was instantiated
with an empty file (from Jerome Nexedi)
Other Changes:
- Downloads now available via PyPI
- Installation through pip library is fixed
Version 1.17, 2013-07-25
- Removed one (from of the two Destination classes. Both
classes had the same name, but were slightly different in content,
causing some errors. (from Janne Vanhala)
- Corrected and Expanded README file to demonstrate PdfFileMerger
- Added filter for LZW encoded streams (from Michal Horejsek)
- PyPDF2 issue tracker enabled on Github to allow community
discussion and collaboration
Versions -1.16, -2013-06-30
- Note: This ChangeLog has not been kept up-to-date for a while.
Hopefully we can keep better track of it from now on. Some of the
changes listed here come from previous versions 1.14 and 1.15; they
were only vaguely defined. With the new file we should
have more structured and better documented versioning from now on.
- Defined PyPDF2.__version__
- Fixed encrypt() method (from Martijn The)
- Improved error handling on PDFs with truncated streams (from cecilkorik)
- Python 3 support (from kushal-kumaran)
- Fixed example code in README (from Jeremy Bethmont)
- Fixed an bug caused by DecimalError Exception (from Adam Morris)
- Many other bug fixes and features by:
Anton Vlasenko
Joseph Walton
Jan Oliver Oelerich
Fabian Henze
And any others I missed.
Thanks for contributing!
Version 1.13, 2010-12-04
- Fixed a typo in code for reading a "\b" escape character in strings.
- Improved __repr__ in FloatObject.
- Fixed a bug in reading octal escape sequences in strings.
- Added getWidth and getHeight methods to the RectangleObject class.
- Fixed compatibility warnings with Python 2.4 and 2.5.
- Added addBlankPage and insertBlankPage methods on PdfFileWriter class.
- Fixed a bug with circular references in page's object trees (typically
annotations) that prevented correctly writing out a copy of those pages.
- New merge page functions allow application of a transformation matrix.
- To all patch contributors: I did a poor job of keeping this ChangeLog
up-to-date for this release, so I am missing attributions here for any
changes you submitted. Sorry! I'll do better in the future.
Version 1.12, 2008-09-02
- Added support for XMP metadata.
- Fix reading files with xref streams with multiple /Index values.
- Fix extracting content streams that use graphics operators longer than 2
characters. Affects merging PDF files.
Version 1.11, 2008-05-09
- Patch from Hartmut Goebel to permit RectangleObjects to accept NumberObject
or FloatObject values.
- PDF compatibility fixes.
- Fix to read object xref stream in correct order.
- Fix for comments inside content streams.
Version 1.10, 2007-10-04
- Text strings from PDF files are returned as Unicode string objects when
pyPdf determines that they can be decoded (as UTF-16 strings, or as
PDFDocEncoding strings). Unicode objects are also written out when
necessary. This means that string objects in pyPdf can be either
generic.ByteStringObject instances, or generic.TextStringObject instances.
- The extractText method now returns a unicode string object.
- All document information properties now return unicode string objects. In
the event that a document provides docinfo properties that are not decoded by
pyPdf, the raw byte strings can be accessed with an "_raw" property (ie.
title_raw rather than title)
- generic.DictionaryObject instances have been enhanced to be easier to use.
Values coming out of dictionary objects will automatically be de-referenced
(.getObject will be called on them), unless accessed by the new "raw_get"
method. DictionaryObjects can now only contain PdfObject instances (as keys
and values), making it easier to debug where non-PdfObject values (which
cannot be written out) are entering dictionaries.
- Support for reading named destinations and outlines in PDF files. Original
patch by Ashish Kulkarni.
- Stream compatibility reading enhancements for malformed PDF files.
- Cross reference table reading enhancements for malformed PDF files.
- Encryption documentation.
- Replace some "assert" statements with error raising.
- Minor optimizations to FlateDecode algorithm increase speed when using PNG
Version 1.9, 2006-12-15
- Fix several serious bugs introduced in version 1.8, caused by a failure to
run through our PDF test suite before releasing that version.
- Fix bug in NullObject reading and writing.
Version 1.8, 2006-12-14
- Add support for decryption with the standard PDF security handler. This
allows for decrypting PDF files given the proper user or owner password.
- Add support for encryption with the standard PDF security handler.
- Add new pythondoc documentation.
- Fix bug in ASCII85 decode that occurs when whitespace exists inside the
two terminating characters of the stream.
Version 1.7, 2006-12-10
- Fix a bug when using a single page object in two PdfFileWriter objects.
- Adjust PyPDF to be tolerant of whitespace characters that don't belong
during a stream object.
- Add documentInfo property to PdfFileReader.
- Add numPages property to PdfFileReader.
- Add pages property to PdfFileReader.
- Add extractText function to PdfFileReader.
Version 1.6, 2006-06-06
- Add basic support for comments in PDF files. This allows us to read some
ReportLab PDFs that could not be read before.
- Add "auto-repair" for finding xref table at slightly bad locations.
- New StreamObject backend, cleaner and more powerful. Allows the use of
stream filters more easily, including compressed streams.
- Add a graphics state push/pop around page merges. Improves quality of
page merges when one page's content stream leaves the graphics
in an abnormal state.
- Add PageObject.compressContentStreams function, which filters all content
streams and compresses them. This will reduce the size of PDF pages,
especially after they could have been decompressed in a mergePage
- Support inline images in PDF content streams.
- Add support for using .NET framework compression when zlib is not
available. This does not make pyPdf compatible with IronPython, but it
is a first step.
- Add support for reading the document information dictionary, and extracting
title, author, subject, producer and creator tags.
- Add patch to support NullObject and multiple xref streams, from Bradley
Version 1.5, 2006-01-28
- Fix a bug where merging pages did not work in "no-rename" cases when the
second page has an array of content streams.
- Remove some debugging output that should not have been present.
Version 1.4, 2006-01-27
- Add capability to merge pages from multiple PDF files into a single page
using the PageObject.mergePage function. See example code (README or web
site) for more information.
- Add ability to modify a page's MediaBox, CropBox, BleedBox, TrimBox, and
ArtBox properties through PageObject. See example code (README or web site)
for more information.
- Refactor into multiple files: (contains objects like
NameObject, DictionaryObject), (contains filter code), (various). This does not affect importing PdfFileReader
or PdfFileWriter.
- Add new decoding functions for standard PDF filters ASCIIHexDecode and
- Change url and download_url to refer to new web site.
Version 1.3, 2006-01-23
- Fix new bug introduced in 1.2 where PDF files with \r line endings did not
work properly anymore. A new test suite developed with various PDF files
should prevent regression bugs from now on.
- Fix a bug where inheriting attributes from page nodes did not work.
Version 1.2, 2006-01-23
- Improved support for files with CRLF-based line endings, fixing a common
reported problem stating "assertion error: assert line == "%%EOF"".
- Software author/maintainer is now officially a proud married person, which
is sure to result in better software... somehow.
Version 1.1, 2006-01-18
- Add capability to rotate pages.
- Improved PDF reading support to properly manage inherited attributes from
/Type=/Pages nodes. This means that page groups that are rotated or have
different media boxes or whatever will now work properly.
- Added PDF 1.5 support. Namely cross-reference streams and object streams.
This release can mangle Adobe's PDFReference16.pdf successfully.
Version 1.0, 2006-01-17
- First distutils-capable true public release. Supports a wide variety of PDF
files that I found sitting around on my system.
- Does not support some PDF 1.5 features, such as object streams,
cross-reference streams.
Copyright (c) 2006-2008, Mathieu Fenniak
Some contributions copyright (c) 2007, Ashish Kulkarni <>
Some contributions copyright (c) 2014, Steve Witham <>
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* The name of the author may not be used to endorse or promote products