Class FastStringBuffer
Note that Stree and DTM used a single FastStringBuffer as a string pool, by recording start and length indices within this single buffer. This minimizes heap overhead, but of course requires more work when retrieving the data.
FastStringBuffer operates as a "chunked buffer". Doing so reduces the need to recopy existing information when an append exceeds the space available; we just allocate another chunk and flow across to it. (The array of chunks may need to grow, admittedly, but that's a much smaller object.) Some excess recopying may arise when we extract Strings which cross chunk boundaries; larger chunks make that less frequent.
The size values are parameterized, to allow tuning this code. In theory, Result Tree Fragments might want to be tuned differently from the main document's text.
%REVIEW% An experiment in self-tuning is included in the code (using nested FastStringBuffers to achieve variation in chunk sizes), but this implementation has proven to be problematic when data may be being copied from the FSB into itself. We should either re-architect that to make this safe (if possible) or remove that code and clean up for performance/maintainability reasons.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
Manifest constant: Suppress both leading and trailing whitespace.static final int
Manifest constant: Suppress leading whitespace.static final int
Manifest constant: Suppress trailing whitespace. -
Constructor Summary
ConstructorsConstructorDescriptionConstruct a FastStringBuffer, using a default allocation policy.FastStringBuffer
(int initChunkBits) Construct a FastStringBuffer, using default maxChunkBits and rebundleBits values.FastStringBuffer
(int initChunkBits, int maxChunkBits) Construct a FastStringBuffer, using a default rebundleBits value.FastStringBuffer
(int initChunkBits, int maxChunkBits, int rebundleBits) Construct a FastStringBuffer, with allocation policy as per parameters. -
Method Summary
Modifier and TypeMethodDescriptionfinal void
append
(char value) Append a single character onto the FastStringBuffer, growing the storage if necessary.final void
append
(char[] chars, int start, int length) Append part of the contents of a Character Array onto the FastStringBuffer, growing the storage if necessary.final void
Append the contents of a String onto the FastStringBuffer, growing the storage if necessary.final void
append
(StringBuffer value) Append the contents of a StringBuffer onto the FastStringBuffer, growing the storage if necessary.final void
append
(FastStringBuffer value) Append the contents of another FastStringBuffer onto this FastStringBuffer, growing the storage if necessary.char
charAt
(int pos) Get a single character from the string buffer.getString
(int start, int length) boolean
isWhitespace
(int start, int length) final int
length()
Get the length of the list.final void
reset()
Discard the content of the FastStringBuffer, and most of the memory that was allocated by it, restoring the initial state.static void
sendNormalizedSAXcharacters
(char[] ch, int start, int length, ContentHandler handler) Directly normalize and dispatch the character array.int
sendNormalizedSAXcharacters
(ContentHandler ch, int start, int length) Sends the specified range of characters as one or more SAX characters() events, normalizing the characters according to XSLT rules.void
sendSAXcharacters
(ContentHandler ch, int start, int length) Sends the specified range of characters as one or more SAX characters() events.void
sendSAXComment
(LexicalHandler ch, int start, int length) Sends the specified range of characters as sax Comment.final void
setLength
(int l) Directly set how much of the FastStringBuffer's storage is to be considered part of its content.final int
size()
Get the length of the list.final String
toString()
Note that this operation has been somewhat deoptimized by the shift to a chunked array, as there is no factory method to produce a String object directly from an array of arrays and hence a double copy is needed.
-
Field Details
-
SUPPRESS_LEADING_WS
public static final int SUPPRESS_LEADING_WSManifest constant: Suppress leading whitespace. This should be used when normalize-to-SAX is called for the first chunk of a multi-chunk output, or one following unsuppressed whitespace in a previous chunk. -
SUPPRESS_TRAILING_WS
public static final int SUPPRESS_TRAILING_WSManifest constant: Suppress trailing whitespace. This should be used when normalize-to-SAX is called for the last chunk of a multi-chunk output; it may have to be or'ed with SUPPRESS_LEADING_WS.- See Also:
-
SUPPRESS_BOTH
public static final int SUPPRESS_BOTHManifest constant: Suppress both leading and trailing whitespace. This should be used when normalize-to-SAX is called for a complete string. (I'm not wild about the name of this one. Ideas welcome.)
-
-
Constructor Details
-
FastStringBuffer
public FastStringBuffer(int initChunkBits, int maxChunkBits, int rebundleBits) Construct a FastStringBuffer, with allocation policy as per parameters.For coding convenience, I've expressed both allocation sizes in terms of a number of bits. That's needed for the final size of a chunk, to permit fast and efficient shift-and-mask addressing. It's less critical for the inital size, and may be reconsidered.
An alternative would be to accept integer sizes and round to powers of two; that really doesn't seem to buy us much, if anything.
- Parameters:
initChunkBits
- Length in characters of the initial allocation of a chunk, expressed in log-base-2. (That is, 10 means allocate 1024 characters.) Later chunks will use larger allocation units, to trade off allocation speed of large document against storage efficiency of small ones.maxChunkBits
- Number of character-offset bits that should be used for addressing within a chunk. Maximum length of a chunk is 2^chunkBits characters.rebundleBits
- Number of character-offset bits that addressing should advance before we attempt to take a step from initChunkBits to maxChunkBits
-
FastStringBuffer
public FastStringBuffer(int initChunkBits, int maxChunkBits) Construct a FastStringBuffer, using a default rebundleBits value. NEEDSDOC @param initChunkBits NEEDSDOC @param maxChunkBits -
FastStringBuffer
public FastStringBuffer(int initChunkBits) Construct a FastStringBuffer, using default maxChunkBits and rebundleBits values.ISSUE: Should this call assert initial size, or fixed size? Now configured as initial, with a default for fixed. NEEDSDOC @param initChunkBits
-
FastStringBuffer
public FastStringBuffer()Construct a FastStringBuffer, using a default allocation policy.
-
-
Method Details
-
size
public final int size()Get the length of the list. Synonym for length().- Returns:
- the number of characters in the FastStringBuffer's content.
-
length
public final int length()Get the length of the list. Synonym for size().- Returns:
- the number of characters in the FastStringBuffer's content.
-
reset
public final void reset()Discard the content of the FastStringBuffer, and most of the memory that was allocated by it, restoring the initial state. Note that this may eventually be different from setLength(0), which see. -
setLength
public final void setLength(int l) Directly set how much of the FastStringBuffer's storage is to be considered part of its content. This is a fast but hazardous operation. It is not protected against negative values, or values greater than the amount of storage currently available... and even if additional storage does exist, its contents are unpredictable. The only safe use for our setLength() is to truncate the FastStringBuffer to a shorter string.- Parameters:
l
- New length. If l<0 or l>=getLength(), this operation will not report an error but future operations will almost certainly fail.
-
toString
Note that this operation has been somewhat deoptimized by the shift to a chunked array, as there is no factory method to produce a String object directly from an array of arrays and hence a double copy is needed. By using ensureCapacity we hope to minimize the heap overhead of building the intermediate StringBuffer.(It really is a pity that Java didn't design String as a final subclass of MutableString, rather than having StringBuffer be a separate hierarchy. We'd avoid a lot of double-buffering.)
-
append
public final void append(char value) Append a single character onto the FastStringBuffer, growing the storage if necessary.NOTE THAT after calling append(), previously obtained references to m_array[][] may no longer be valid.... though in fact they should be in this instance.
- Parameters:
value
- character to be appended.
-
append
Append the contents of a String onto the FastStringBuffer, growing the storage if necessary.NOTE THAT after calling append(), previously obtained references to m_array[] may no longer be valid.
- Parameters:
value
- String whose contents are to be appended.
-
append
Append the contents of a StringBuffer onto the FastStringBuffer, growing the storage if necessary.NOTE THAT after calling append(), previously obtained references to m_array[] may no longer be valid.
- Parameters:
value
- StringBuffer whose contents are to be appended.
-
append
public final void append(char[] chars, int start, int length) Append part of the contents of a Character Array onto the FastStringBuffer, growing the storage if necessary.NOTE THAT after calling append(), previously obtained references to m_array[] may no longer be valid.
- Parameters:
chars
- character array from which data is to be copiedstart
- offset in chars of first character to be copied, zero-based.length
- number of characters to be copied
-
append
Append the contents of another FastStringBuffer onto this FastStringBuffer, growing the storage if necessary.NOTE THAT after calling append(), previously obtained references to m_array[] may no longer be valid.
- Parameters:
value
- FastStringBuffer whose contents are to be appended.
-
isWhitespace
public boolean isWhitespace(int start, int length) - Parameters:
start
- Offset of first character in the range.length
- Number of characters to send.- Returns:
- true if the specified range of characters are all whitespace,
as defined by XMLCharacterRecognizer.
CURRENTLY DOES NOT CHECK FOR OUT-OF-RANGE.
-
getString
- Parameters:
start
- Offset of first character in the range.length
- Number of characters to send.- Returns:
- a new String object initialized from the specified range of characters.
-
charAt
public char charAt(int pos) Get a single character from the string buffer.- Parameters:
pos
- character position requested.- Returns:
- A character from the requested position.
-
sendSAXcharacters
Sends the specified range of characters as one or more SAX characters() events. Note that the buffer reference passed to the ContentHandler may be invalidated if the FastStringBuffer is edited; it's the user's responsibility to manage access to the FastStringBuffer to prevent this problem from arising.Note too that there is no promise that the output will be sent as a single call. As is always true in SAX, one logical string may be split across multiple blocks of memory and hence delivered as several successive events.
- Parameters:
ch
- SAX ContentHandler object to receive the event.start
- Offset of first character in the range.length
- Number of characters to send.- Throws:
SAXException
- may be thrown by handler's characters() method.
-
sendNormalizedSAXcharacters
public int sendNormalizedSAXcharacters(ContentHandler ch, int start, int length) throws SAXException Sends the specified range of characters as one or more SAX characters() events, normalizing the characters according to XSLT rules.- Parameters:
ch
- SAX ContentHandler object to receive the event.start
- Offset of first character in the range.length
- Number of characters to send.- Returns:
- normalization status to apply to next chunk (because we may
have been called recursively to process an inner FSB):
- 0
- if this output did not end in retained whitespace, and thus whitespace at the start of the following chunk (if any) should be converted to a single space.
- SUPPRESS_LEADING_WS
- if this output ended in retained whitespace, and thus whitespace at the start of the following chunk (if any) should be completely suppressed.
- Throws:
SAXException
- may be thrown by handler's characters() method. -
sendNormalizedSAXcharacters
public static void sendNormalizedSAXcharacters(char[] ch, int start, int length, ContentHandler handler) throws SAXException Directly normalize and dispatch the character array.- Parameters:
ch
- The characters from the XML document.start
- The start position in the array.length
- The number of characters to read from the array.handler
- SAX ContentHandler object to receive the event.- Throws:
SAXException
- Any SAX exception, possibly wrapping another exception.
-
sendSAXComment
Sends the specified range of characters as sax Comment.Note that, unlike sendSAXcharacters, this has to be done as a single call to LexicalHandler#comment.
- Parameters:
ch
- SAX LexicalHandler object to receive the event.start
- Offset of first character in the range.length
- Number of characters to send.- Throws:
SAXException
- may be thrown by handler's characters() method.
-