Package org.apache.pdfbox.util
Class PDFHighlighter
java.lang.Object
org.apache.pdfbox.util.PDFStreamEngine
org.apache.pdfbox.util.PDFTextStripper
org.apache.pdfbox.util.PDFHighlighter
Highlighting of words in a PDF document with an XML file.
- Version:
- $Revision: 1.7 $
- Author:
- slagraulet (slagraulet@cardiweb.com), Ben Litchfield
- See Also:
-
Field Summary
Fields inherited from class org.apache.pdfbox.util.PDFTextStripper
charactersByArticle, document, output, outputEncoding, systemLineSeparator
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected void
End a page.void
generateXMLHighlight
(PDDocument pdDocument, String[] sWords, Writer xmlOutput) Generate an XML highlight string based on the PDF.void
generateXMLHighlight
(PDDocument pdDocument, String highlightWord, Writer xmlOutput) Generate an XML highlight string based on the PDF.static void
Command line application.Methods inherited from class org.apache.pdfbox.util.PDFTextStripper
endArticle, endDocument, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageSeparator, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getText, getWordSeparator, handleLineSeparation, inspectFontEncoding, isParagraphSeparation, matchListItemPattern, matchPattern, processPage, processPages, processTextPosition, resetEngine, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageSeparator, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePage, writePageEnd, writePageSeperator, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeText, writeWordSeparator
Methods inherited from class org.apache.pdfbox.util.PDFStreamEngine
getColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getTotalCharCnt, getValidCharCnt, getXObjects, isForceParsing, processEncodedText, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, setColorSpaces, setFonts, setForceParsing, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix
-
Constructor Details
-
PDFHighlighter
Default constructor.- Throws:
IOException
- If there is an error constructing this class.
-
-
Method Details
-
generateXMLHighlight
public void generateXMLHighlight(PDDocument pdDocument, String highlightWord, Writer xmlOutput) throws IOException Generate an XML highlight string based on the PDF.- Parameters:
pdDocument
- The PDF to find words in.highlightWord
- The word to search for.xmlOutput
- The resulting output xml file.- Throws:
IOException
- If there is an error reading from the PDF, or writing to the XML.
-
generateXMLHighlight
public void generateXMLHighlight(PDDocument pdDocument, String[] sWords, Writer xmlOutput) throws IOException Generate an XML highlight string based on the PDF.- Parameters:
pdDocument
- The PDF to find words in.sWords
- The words to search for.xmlOutput
- The resulting output xml file.- Throws:
IOException
- If there is an error reading from the PDF, or writing to the XML.
-
endPage
End a page. Default implementation is to do nothing. Subclasses may provide additional information.- Overrides:
endPage
in classPDFTextStripper
- Parameters:
pdPage
- The page we are about to process.- Throws:
IOException
- If there is any error writing to the stream.
-
main
Command line application.- Parameters:
args
- The command line arguments to the application.- Throws:
IOException
- If there is an error generating the highlight file.
-