Class ConformingPDFParser

java.lang.Object
org.apache.pdfbox.pdfparser.BaseParser
org.apache.pdfbox.pdfparser.ConformingPDFParser

public class ConformingPDFParser extends BaseParser
Author:
Adam Nichols
  • Field Details

  • Constructor Details

    • ConformingPDFParser

      public ConformingPDFParser(File inputFile) throws IOException
      Constructor.
      Parameters:
      inputFile - The input stream that contains the PDF document.
      Throws:
      IOException - If there is an error initializing the stream.
  • Method Details

    • parse

      public void parse() throws IOException
      This will parse the stream and populate the COSDocument object. This will close the stream when it is done parsing.
      Throws:
      IOException - If there is an error reading from the stream or corrupt data is found.
    • getDocument

      public COSDocument getDocument() throws IOException
      This will get the document that was parsed. parse() must be called before this is called. When you are done with this document you must call close() on it to release resources.
      Returns:
      The document that was parsed.
      Throws:
      IOException - If there is an error getting the document.
    • getPDDocument

      public PDDocument getPDDocument() throws IOException
      This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources.
      Returns:
      The document at the PD layer.
      Throws:
      IOException - If there is an error getting the document.
    • parseTrailerInformation

      protected long parseTrailerInformation() throws IOException, NumberFormatException
      Throws:
      IOException
      NumberFormatException
    • readByteBackwards

      protected byte readByteBackwards() throws IOException
      Throws:
      IOException
    • readByte

      protected byte readByte() throws IOException
      Throws:
      IOException
    • readBackwardUntilWhitespace

      protected String readBackwardUntilWhitespace() throws IOException
      Throws:
      IOException
    • consumeWhitespaceBackwards

      protected byte consumeWhitespaceBackwards() throws IOException
      This will read all bytes (backwards) until a non-whitespace character is found. To save you an extra read, the non-whitespace character is returned. If the current character is not whitespace, this method will just return the current char.
      Returns:
      the first non-whitespace character found
      Throws:
      IOException - if there is an error reading from the file
    • consumeWhitespace

      protected byte consumeWhitespace() throws IOException
      This will read all bytes until a non-whitespace character is found. To save you an extra read, the non-whitespace character is returned. If the current character is not whitespace, this method will just return the current char.
      Returns:
      the first non-whitespace character found
      Throws:
      IOException - if there is an error reading from the file
    • readLongBackwards

      protected long readLongBackwards() throws IOException, NumberFormatException
      This will consume any whitespace, read in bytes until whitespace is found again and then parse the characters which have been read as a long. The current offset will then point at the first whitespace character which preceeds the number.
      Returns:
      the parsed number
      Throws:
      IOException - if there is an error reading from the file
      NumberFormatException - if the bytes read can not be converted to a number
    • readInt

      protected int readInt() throws IOException
      Description copied from class: BaseParser
      This will read an integer from the stream.
      Overrides:
      readInt in class BaseParser
      Returns:
      The integer that was read from the stream.
      Throws:
      IOException - If there is an error reading from the stream.
    • readNumber

      protected COSNumber readNumber() throws IOException
      This will read in a number and return the COS version of the number (be it a COSInteger or a COSFloat).
      Returns:
      the COSNumber which was read/parsed
      Throws:
      IOException
    • parseNumber

      protected COSNumber parseNumber(String number) throws IOException
      Throws:
      IOException
    • processCosObject

      protected COSBase processCosObject(String string) throws IOException
      Throws:
      IOException
    • readObjectBackwards

      protected COSBase readObjectBackwards() throws IOException
      Throws:
      IOException
    • readNameBackwards

      protected COSName readNameBackwards() throws IOException
      Throws:
      IOException
    • getObject

      public COSBase getObject(long objectNumber, long generation) throws IOException
      Throws:
      IOException
    • readObject

      public COSBase readObject(long objectNumber, long generation) throws IOException
      This will read an object from the inputFile at whatever our currentOffset is. If the object and generation are not the expected values and this object is set to throw an exception for non-conforming documents, then an exception will be thrown.
      Parameters:
      objectNumber - the object number you expect to read
      generation - the generation you expect this object to be
      Returns:
      the object being read.
      Throws:
      IOException
    • readObject

      protected COSBase readObject() throws IOException
      This actually reads the object data.
      Returns:
      the object which is read
      Throws:
      IOException
    • readString

      protected String readString() throws IOException
      This will read the next string from the stream.
      Overrides:
      readString in class BaseParser
      Returns:
      The string that was read from the stream.
      Throws:
      IOException - If there is an error reading from the stream.
    • readDictionaryBackwards

      protected COSDictionary readDictionaryBackwards() throws IOException
      Throws:
      IOException
    • readLineBackwards

      protected String readLineBackwards() throws IOException
      This will read a line starting with the byte at offset and going backwards until it finds a newline. This should only be used if we are certain that the data will only be text, and not binary data.
      Returns:
      the string which was read
      Throws:
      IOException - if there was an error reading data from the file
    • readLine

      protected String readLine() throws IOException
      This will read a line starting with the byte at offset and going forward until it finds a newline. This should only be used if we are certain that the data will only be text, and not binary data.
      Overrides:
      readLine in class BaseParser
      Returns:
      the string which was read
      Throws:
      IOException - if there was an error reading data from the file
    • readWord

      protected String readWord() throws IOException
      Throws:
      IOException
    • isRecursivlyRead

      public boolean isRecursivlyRead()
      Returns:
      the recursivlyRead
    • setRecursivlyRead

      public void setRecursivlyRead(boolean recursivlyRead)
      Parameters:
      recursivlyRead - the recursivlyRead to set