Package org.apache.pdfbox.util
Class PDFStreamEngine
java.lang.Object
org.apache.pdfbox.util.PDFStreamEngine
- Direct Known Subclasses:
PageDrawer
,PDFImageWriter
,PDFMarkedContentExtractor
,PDFTextStripper
,Type3StreamParser
This class will run through a PDF content stream and execute certain operations
and provide a callback interface for clients that want to do things with the stream.
See the PDFTextStripper class for an example of how to use this class.
- Version:
- $Revision: 1.38 $
- Author:
- Ben Litchfield
-
Constructor Summary
ConstructorsConstructorDescriptionConstructor.PDFStreamEngine
(Properties properties) Constructor with engine properties. -
Method Summary
Modifier and TypeMethodDescriptionGet the current page that is being processed.getFonts()
int
Get the total number of characters in the doc (including ones that could not be mapped).int
Get the total number of valid characters in the doc that could be decoded in processEncodedText().protected String
A method provided as an event interface to allow a subclass to perform some specific functionality on the string encoded by a glyph.boolean
Indicates if force parsing is activated.void
processEncodedText
(byte[] string) Process encoded text from the PDF Stream.void
processOperator
(String operation, List<COSBase> arguments) This is used to handle an operation.protected void
processOperator
(PDFOperator operator, List<COSBase> arguments) This is used to handle an operation.void
processStream
(PDPage aPage, PDResources resources, COSStream cosStream) This will process the contents of the stream.void
processSubStream
(PDPage aPage, PDResources resources, COSStream cosStream) Process a sub stream of the current stream.protected void
A method provided as an event interface to allow a subclass to perform some specific functionality when text needs to be processed.void
registerOperatorProcessor
(String operator, OperatorProcessor op) Register a custom operator processor with the engine.void
This method must be called between processing documents.void
setColorSpaces
(Map<String, PDColorSpace> value) void
void
setForceParsing
(boolean forceParsingValue) Enable/Disable force parsing.void
setGraphicsStack
(Stack<PDGraphicsState> value) void
setGraphicsState
(PDGraphicsState value) void
void
setTextLineMatrix
(Matrix value) void
setTextMatrix
(Matrix value)
-
Constructor Details
-
PDFStreamEngine
public PDFStreamEngine()Constructor. -
PDFStreamEngine
Constructor with engine properties. The property keys are all PDF operators, the values are class names used to execute those operators. An empty value means that the operator will be silently ignored.- Parameters:
properties
- The engine properties.- Throws:
IOException
- If there is an error setting the engine properties.
-
-
Method Details
-
isForceParsing
public boolean isForceParsing()Indicates if force parsing is activated.- Returns:
- true if force parsing is active
-
setForceParsing
public void setForceParsing(boolean forceParsingValue) Enable/Disable force parsing.- Parameters:
forceParsingValue
- true activates force parsing
-
registerOperatorProcessor
Register a custom operator processor with the engine.- Parameters:
operator
- The operator as a string.op
- Processor instance.
-
resetEngine
public void resetEngine()This method must be called between processing documents. The PDFStreamEngine caches information for the document between pages and this will release the cached information. This only needs to be called if processing a new document. -
processStream
public void processStream(PDPage aPage, PDResources resources, COSStream cosStream) throws IOException This will process the contents of the stream.- Parameters:
aPage
- The page.resources
- The location to retrieve resources.cosStream
- the Stream to execute.- Throws:
IOException
- if there is an error accessing the stream.
-
processSubStream
public void processSubStream(PDPage aPage, PDResources resources, COSStream cosStream) throws IOException Process a sub stream of the current stream.- Parameters:
aPage
- The page used for drawing.resources
- The resources used when processing the stream.cosStream
- The stream to process.- Throws:
IOException
- If there is an exception while processing the stream.
-
processTextPosition
A method provided as an event interface to allow a subclass to perform some specific functionality when text needs to be processed.- Parameters:
text
- The text to be processed.
-
inspectFontEncoding
A method provided as an event interface to allow a subclass to perform some specific functionality on the string encoded by a glyph.- Parameters:
str
- The string to be processed.
-
processEncodedText
Process encoded text from the PDF Stream. You should override this method if you want to perform an action when encoded text is being processed.- Parameters:
string
- The encoded text- Throws:
IOException
- If there is an error processing the string
-
processOperator
This is used to handle an operation.- Parameters:
operation
- The operation to perform.arguments
- The list of arguments.- Throws:
IOException
- If there is an error processing the operation.
-
processOperator
This is used to handle an operation.- Parameters:
operator
- The operation to perform.arguments
- The list of arguments.- Throws:
IOException
- If there is an error processing the operation.
-
getColorSpaces
- Returns:
- Returns the colorSpaces.
-
getXObjects
- Returns:
- Returns the colorSpaces.
-
setColorSpaces
- Parameters:
value
- The colorSpaces to set.
-
getFonts
- Returns:
- Returns the fonts.
-
setFonts
- Parameters:
value
- The fonts to set.
-
getGraphicsStack
- Returns:
- Returns the graphicsStack.
-
setGraphicsStack
- Parameters:
value
- The graphicsStack to set.
-
getGraphicsState
- Returns:
- Returns the graphicsState.
-
setGraphicsState
- Parameters:
value
- The graphicsState to set.
-
getGraphicsStates
- Returns:
- Returns the graphicsStates.
-
setGraphicsStates
- Parameters:
value
- The graphicsStates to set.
-
getTextLineMatrix
- Returns:
- Returns the textLineMatrix.
-
setTextLineMatrix
- Parameters:
value
- The textLineMatrix to set.
-
getTextMatrix
- Returns:
- Returns the textMatrix.
-
setTextMatrix
- Parameters:
value
- The textMatrix to set.
-
getResources
- Returns:
- Returns the resources.
-
getCurrentPage
Get the current page that is being processed.- Returns:
- The page being processed.
-
getValidCharCnt
public int getValidCharCnt()Get the total number of valid characters in the doc that could be decoded in processEncodedText().- Returns:
- The number of valid characters.
-
getTotalCharCnt
public int getTotalCharCnt()Get the total number of characters in the doc (including ones that could not be mapped).- Returns:
- The number of characters.
-