Class TextNormalize

java.lang.Object
org.apache.pdfbox.util.TextNormalize

public class TextNormalize extends Object
This class allows a caller to normalize text in various ways. It will load the ICU4J jar file if it is defined on the classpath.
Author:
Brian Carrier
  • Constructor Details

    • TextNormalize

      public TextNormalize(String encoding)
      Parameters:
      encoding - The Encoding that the text will eventually be written as (or null)
  • Method Details

    • makeLineLogicalOrder

      public String makeLineLogicalOrder(String str, boolean isRtlDominant)
      Deprecated.
      isn't used anymore
      Takes a line of text in presentation order and converts it to logical order. For most text other than Arabic and Hebrew, the presentation and logical orders are the same. However, for Arabic and Hebrew, they are different and if the text involves both RTL and LTR text then the Unicode BIDI algorithm must be used to determine how to map between them.
      Parameters:
      str - Presentation form of line to convert (i.e. left most char is first char)
      isRtlDominant - true if the PAGE has a dominant right to left ordering
      Returns:
      Logical form of string (or original string if ICU4J library is not on classpath)
    • normalizePres

      public String normalizePres(String str)
      Normalize the presentation forms of characters in the string. For example, convert the single "fi" ligature to "f" and "i".
      Parameters:
      str - String to normalize
      Returns:
      Normalized string (or original string if ICU4J library is not on classpath)
    • normalizeDiac

      public String normalizeDiac(String str)
      Normalize the diacritic, for example, convert non-combining diacritic characters to their combining counterparts.
      Parameters:
      str - String to normalize
      Returns:
      Normalized string (or original string if ICU4J library is not on classpath)