Package org.apache.pdfbox.cos
Class COSDocument
java.lang.Object
org.apache.pdfbox.cos.COSBase
org.apache.pdfbox.cos.COSDocument
- All Implemented Interfaces:
Closeable
,AutoCloseable
,COSObjectable
This is the in-memory representation of the PDF document. You need to call
close() on this object when you are done using it!!
- Author:
- Ben Litchfield
-
Constructor Summary
ConstructorsConstructorDescriptionConstructor.COSDocument
(File scratchDir) Constructor that will create a create a scratch file in the following directory.COSDocument
(File scratchDir, boolean forceParsingValue) Constructor that will use a temporary file in the given directory for storage of the PDF streams.COSDocument
(RandomAccess file) Constructor that will use the following random access file for storage of the PDF streams.COSDocument
(RandomAccess scratchFileValue, boolean forceParsingValue) Constructor that will use the given random access file for storage of the PDF streams. -
Method Summary
Modifier and TypeMethodDescriptionaccept
(ICOSVisitor visitor) visitor pattern double dispatch method.void
addXRefTable
(Map<COSObjectKey, Long> xrefTableValues) Populate XRef HashMap with given values.void
close()
This will close all storage and delete the tmp files.Create a new COSStream using the underlying scratch file.createCOSStream
(COSDictionary dictionary) Create a new COSStream using the underlying scratch file.void
This method will search the list of objects for types of ObjStm.protected void
finalize()
Warn the user in the finalizer if he didn't close the PDF document.This will get the document catalog.This will get the document ID.This will get the encryption dictionary if the document is encrypted or null if the document is not encrypted.getObjectByType
(String type) Deprecated.getObjectByType
(COSName type) This will get the first dictionary object by type.This will get an object from the pool.This will get a list of all available objects.getObjectsByType
(String type) This will get all dictionary objects by type.getObjectsByType
(COSName type) This will get a dictionary object by type.Get the original headerString from the PDF file.Deprecated.direct access to the scratch file will be removedThis will return a list of signature dictionaries as COSDictionary.getSignatureFields
(boolean onlyEmptyFields) This will return a list of signature fields.This will return the signature interface.long
Return the startXref Position of the parsed document.This will get the document trailer.float
This will get the version of this PDF document.Returns the xrefTable which is a mapping of ObjectKeys to byte offsets in the file.boolean
Indicates if a encrypted pdf is already decrypted after parsing.boolean
This will tell if this is an encrypted document.boolean
Determines it the trailer is a XRef stream or not.void
print()
This will print contents to stdout.removeObject
(COSObjectKey key) Removes an object from the object pool.void
Signals that the document is decrypted completely.void
This will set the document ID.void
setEncryptionDictionary
(COSDictionary encDictionary) This will set the encryption dictionary, this should only be called when encrypting the document.void
setHeaderString
(String header) void
setSignatureInterface
(SignatureInterface sigInterface) Set the signature interface to the given value.void
setStartXref
(long startXrefValue) This method set the startxref value of the document.void
setTrailer
(COSDictionary newTrailer) // MIT added, maybe this should not be supported as trailer is a persistence construct.void
setVersion
(float versionValue) This will set the version of this PDF document and update the header string.void
setWarnMissingClose
(boolean warn) Controls whether this instance shall issue a warning if the PDF document wasn't closed properly through a call to theclose()
method.Methods inherited from class org.apache.pdfbox.cos.COSBase
getCOSObject, getFilterManager, isDirect, isNeedToBeUpdate, setDirect, setNeedToBeUpdate
-
Constructor Details
-
COSDocument
Constructor that will use the given random access file for storage of the PDF streams. The client of this method is responsible for deleting the storage if necessary that this file will write to. The close method will close the file though.- Parameters:
scratchFileValue
- the random access file to use for storageforceParsingValue
- flag to skip malformed or otherwise unparseable document content where possible
-
COSDocument
Constructor that will use a temporary file in the given directory for storage of the PDF streams. The temporary file is automatically removed when this document gets closed.- Parameters:
scratchDir
- directory for the temporary file, ornull
to use the system defaultforceParsingValue
- flag to skip malformed or otherwise unparseable document content where possible- Throws:
IOException
- if something went wrong
-
COSDocument
public COSDocument()Constructor. Uses memory to store stream. -
COSDocument
Constructor that will create a create a scratch file in the following directory.- Parameters:
scratchDir
- The directory to store a scratch file.- Throws:
IOException
- If there is an error creating the tmp file.
-
COSDocument
Constructor that will use the following random access file for storage of the PDF streams. The client of this method is responsible for deleting the storage if necessary that this file will write to. The close method will close the file though.- Parameters:
file
- The random access file to use for storage.
-
-
Method Details
-
getScratchFile
Deprecated.direct access to the scratch file will be removedThis will get the scratch file for this document.- Returns:
- The scratch file.
-
createCOSStream
Create a new COSStream using the underlying scratch file.- Returns:
- the new COSStream
-
createCOSStream
Create a new COSStream using the underlying scratch file.- Parameters:
dictionary
- the corresponding dictionary- Returns:
- the new COSStream
-
getObjectByType
Deprecated.usegetObjectByType(COSName)
insteadThis will get the first dictionary object by type.- Parameters:
type
- The type of the object.- Returns:
- This will return an object with the specified type.
- Throws:
IOException
- If there is an error getting the object
-
getObjectByType
This will get the first dictionary object by type.- Parameters:
type
- The type of the object.- Returns:
- This will return an object with the specified type.
- Throws:
IOException
- If there is an error getting the object
-
getObjectsByType
This will get all dictionary objects by type.- Parameters:
type
- The type of the object.- Returns:
- This will return an object with the specified type.
- Throws:
IOException
- If there is an error getting the object
-
getObjectsByType
This will get a dictionary object by type.- Parameters:
type
- The type of the object.- Returns:
- This will return an object with the specified type.
- Throws:
IOException
- If there is an error getting the object
-
print
public void print()This will print contents to stdout. -
setVersion
public void setVersion(float versionValue) This will set the version of this PDF document and update the header string.- Parameters:
versionValue
- The version of the PDF document.
-
getVersion
public float getVersion()This will get the version of this PDF document.- Returns:
- This documents version.
-
setDecrypted
public void setDecrypted()Signals that the document is decrypted completely. Needed e.g. byNonSequentialPDFParser
to circumvent additional decryption later on. -
isDecrypted
public boolean isDecrypted()Indicates if a encrypted pdf is already decrypted after parsing. Does make sense only if theNonSequentialPDFParser
is used.- Returns:
- true indicates that the pdf is decrypted.
-
isEncrypted
public boolean isEncrypted()This will tell if this is an encrypted document.- Returns:
- true If this document is encrypted.
-
getEncryptionDictionary
This will get the encryption dictionary if the document is encrypted or null if the document is not encrypted.- Returns:
- The encryption dictionary.
-
getSignatureInterface
This will return the signature interface.- Returns:
- the signature interface
-
setEncryptionDictionary
This will set the encryption dictionary, this should only be called when encrypting the document.- Parameters:
encDictionary
- The encryption dictionary.
-
getSignatureDictionaries
This will return a list of signature dictionaries as COSDictionary.- Returns:
- list of signature dictionaries as COSDictionary
- Throws:
IOException
- if no document catalog can be found
-
getSignatureFields
This will return a list of signature fields.- Parameters:
onlyEmptyFields
- only empty signature fields will be returned- Returns:
- list of signature dictionaries as COSDictionary
- Throws:
IOException
- if no document catalog can be found
-
getDocumentID
This will get the document ID.- Returns:
- The document id.
-
setDocumentID
This will set the document ID.- Parameters:
id
- The document id.
-
setSignatureInterface
Set the signature interface to the given value.- Parameters:
sigInterface
- the signature interface
-
getCatalog
This will get the document catalog. Maybe this should move to an object at PDFEdit level- Returns:
- catalog is the root of all document activities
- Throws:
IOException
- If no catalog can be found.
-
getObjects
This will get a list of all available objects.- Returns:
- A list of all objects.
-
getTrailer
This will get the document trailer.- Returns:
- the document trailer dict
-
setTrailer
// MIT added, maybe this should not be supported as trailer is a persistence construct. This will set the document trailer.- Parameters:
newTrailer
- the document trailer dictionary
-
accept
visitor pattern double dispatch method.- Specified by:
accept
in classCOSBase
- Parameters:
visitor
- The object to notify when visiting this object.- Returns:
- any object, depending on the visitor implementation, or null
- Throws:
COSVisitorException
- If an error occurs while visiting this object.
-
close
This will close all storage and delete the tmp files.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
- If there is an error close resources.
-
finalize
Warn the user in the finalizer if he didn't close the PDF document. The method also closes the document just in case, to avoid abandoned temporary files. It's still a good idea for the user to close the PDF document at the earliest possible to conserve resources.- Overrides:
finalize
in classObject
- Throws:
IOException
- if an error occurs while closing the temporary files
-
setWarnMissingClose
public void setWarnMissingClose(boolean warn) Controls whether this instance shall issue a warning if the PDF document wasn't closed properly through a call to theclose()
method. If the PDF document is held in a cache governed by soft references it is impossible to reliably close the document before the warning is raised. By default, the warning is enabled.- Parameters:
warn
- true enables the warning, false disables it.
-
getHeaderString
- Returns:
- Returns the current headerString. (It may have been updated by calls to
setVersion(float)
)
-
setHeaderString
- Parameters:
header
- The headerString to set.
-
getOriginalHeaderString
Get the original headerString from the PDF file. UnlikegetHeaderString()
, the value is not changed by files that have another header value in the document catalog.- Returns:
- the original header string.
-
dereferenceObjectStreams
This method will search the list of objects for types of ObjStm. If it finds them then it will parse out all of the objects from the stream that is contains.- Throws:
IOException
- If there is an error parsing the stream.
-
getObjectFromPool
This will get an object from the pool.- Parameters:
key
- The object key.- Returns:
- The object in the pool or a new one if it has not been parsed yet.
- Throws:
IOException
- If there is an error getting the proxy object.
-
removeObject
Removes an object from the object pool.- Parameters:
key
- the object key- Returns:
- the object that was removed or null if the object was not found
-
addXRefTable
Populate XRef HashMap with given values. Each entry maps ObjectKeys to byte offsets in the file.- Parameters:
xrefTableValues
- xref table entries to be added
-
getXrefTable
Returns the xrefTable which is a mapping of ObjectKeys to byte offsets in the file.- Returns:
- mapping of ObjectsKeys to byte offsets
-
setStartXref
public void setStartXref(long startXrefValue) This method set the startxref value of the document. This will only be needed for incremental updates.- Parameters:
startXrefValue
- the value for startXref
-
getStartXref
public long getStartXref()Return the startXref Position of the parsed document. This will only be needed for incremental updates.- Returns:
- a long with the old position of the startxref
-
isXRefStream
public boolean isXRefStream()Determines it the trailer is a XRef stream or not.- Returns:
- true if the trailer is a XRef stream
-
getObjectByType(COSName)
instead