Package org.apache.pdfbox.util
Class TextNormalize
java.lang.Object
org.apache.pdfbox.util.TextNormalize
This class allows a caller to normalize text in various ways. It will load the ICU4J jar file if it is defined on the
classpath.
- Author:
- Brian Carrier
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionmakeLineLogicalOrder
(String str, boolean isRtlDominant) Deprecated.isn't used anymorenormalizeDiac
(String str) Normalize the diacritic, for example, convert non-combining diacritic characters to their combining counterparts.normalizePres
(String str) Normalize the presentation forms of characters in the string.
-
Constructor Details
-
TextNormalize
- Parameters:
encoding
- The Encoding that the text will eventually be written as (or null)
-
-
Method Details
-
makeLineLogicalOrder
Deprecated.isn't used anymoreTakes a line of text in presentation order and converts it to logical order. For most text other than Arabic and Hebrew, the presentation and logical orders are the same. However, for Arabic and Hebrew, they are different and if the text involves both RTL and LTR text then the Unicode BIDI algorithm must be used to determine how to map between them.- Parameters:
str
- Presentation form of line to convert (i.e. left most char is first char)isRtlDominant
- true if the PAGE has a dominant right to left ordering- Returns:
- Logical form of string (or original string if ICU4J library is not on classpath)
-
normalizePres
Normalize the presentation forms of characters in the string. For example, convert the single "fi" ligature to "f" and "i".- Parameters:
str
- String to normalize- Returns:
- Normalized string (or original string if ICU4J library is not on classpath)
-
normalizeDiac
Normalize the diacritic, for example, convert non-combining diacritic characters to their combining counterparts.- Parameters:
str
- String to normalize- Returns:
- Normalized string (or original string if ICU4J library is not on classpath)
-