|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object pt.tumba.ngram.compression.PPMNode
final class PPMNode
A node in a depth-bounded suffix tree that represents counts of
sequences of bytes. Nodes in the trie are accessed through
sequences of bytes in the same way as the Map.get(java.lang.Object)
method of Map
. Sequences of bytes are added as
descendants of the current node if necessary and their counts are
incremented and rescaled if necessary through an increment
operation that works in the same way as Map.put(K, V)
in
Map
.
The entire trie is accessible through the root node. A byte is stored with each node, and a (scaled) count of the times the the sequence of bytes leading to this node in the trie were seen. Each node contains a linked list of child nodes, the first of which is provided as a member element. Each node also represents a member of the linked list of siblings, and the next sibling is provided as a member element.
The nodes provide the cumulative probability estimates for the
model through the total
, interval
and
intervalEscape
methods.
PPMModel
Field Summary | |
---|---|
(package private) byte |
_byte
The byte for this node. |
private short |
_count
The scaled count for this node. |
(package private) PPMNode |
_firstChild
The first child of this node. |
(package private) PPMNode |
_nextSibling
The next sibling of this node. |
private short |
_numberOfOutcomes
The scaled number of outcomes used to calculate escape likelihoods. |
private static int |
MAX_INDIVIDUAL_COUNT
Maximum count for daughter node before rescaling all daughters. |
private static int |
MIN_COUNT
Minimum count for which to retain a node during rescaling. |
private static int |
MIN_PRUNE_COUNT
Minimum count for a node to survive pruning. |
Constructor Summary | |
---|---|
PPMNode(byte b)
Construct a node with the specified byte. |
|
PPMNode(byte b,
PPMNode nextSibling)
Construct a node with the specified byte and next sibling. |
Method Summary | |
---|---|
(package private) void |
complete(byte[] bytes,
int offset,
int length)
Extends this node with the given sequence of bytes, specified by an array, offset and length. |
private boolean |
hasDaughter(byte b)
Returns true if this node has a child with the specified byte. |
(package private) boolean |
hasDaughter(int i)
Returns true if this node has a child with the specified byte, specified
as an integer. |
(package private) void |
increment(byte[] bytes,
int offset,
int length)
Increment the count of all of the nodes along the sequence of bytes determined by the specified array, beginning at the specified offset and continuing for the specified length number of bytes. |
(package private) void |
increment(ByteBuffer buffer)
Increment the counts for this node for the string specified in the buffer. |
private void |
interval(byte b,
ByteSet excludedBytes,
int[] result)
Calculates the interval for the specified byte from this node and writes it into the specified array. |
(package private) void |
interval(int i,
ByteSet excludedBytes,
int[] result)
Calculates the interval for the specified byte from this node and writes it into the specified array. |
(package private) void |
intervalEscape(ByteSet excludedBytes,
int[] result)
The interval for the escape count, less the set of excluded bytes. |
(package private) boolean |
isChildless(ByteSet excludedBytes)
Returns true if this node has no children, not counting
specified exclusions. |
(package private) boolean |
isDeterministic(ByteSet excludedBytes)
Returns true if the number of children for
this node is 1 . |
(package private) int |
pointToSymbol(int midCount,
ByteSet excludedBytes)
Retrieves the symbol for which the midCount is between its low and high counts (inclusive on low, exclusive on high). |
PPMNode |
prune()
Prunes this node and its children, returning null if
the node's count is too low and pruning all children with counts
too low. |
private PPMNode |
pruneSiblings()
Prunes the siblings of this node, returning the next sibling or null if there aren't any. |
private void |
rescale()
Rescale all of the counts of the children of this node. |
private PPMNode |
rescaleSiblings()
Rescale the counts on this node and the siblings of this node. |
(package private) int |
totalCount(ByteSet excludedBytes)
Total count for this node, not including those bytes in the specified set. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
final byte _byte
private short _count
private short _numberOfOutcomes
PPMNode _firstChild
PPMNode _nextSibling
private static final int MIN_PRUNE_COUNT
private static final int MIN_COUNT
private static final int MAX_INDIVIDUAL_COUNT
Constructor Detail |
---|
PPMNode(byte b, PPMNode nextSibling)
b
- Byte represented by node.nextSibling
- The next daughter node in the list of daughters.PPMNode(byte b)
b
- Byte represented by node.Method Detail |
---|
boolean isDeterministic(ByteSet excludedBytes)
true
if the number of children for
this node is 1
.
bytes
- Bytes that have been seen in escaped context that should not be considered children.
true
if the scaled number of outcomes for this node is 1
.boolean isChildless(ByteSet excludedBytes)
true
if this node has no children, not counting
specified exclusions.
excludedBytes
- Bytes to exclude as children
true
if this node has no children, not countingint totalCount(ByteSet excludedBytes)
excludedBytes
- Set of bytes to exclude from counts.
void interval(int i, ByteSet excludedBytes, int[] result)
b
- Byte whose interval is calcuated.excludedBytes
- Set of bytes to exclude from counts.result
- Array in which to write the range for the specified byte.private void interval(byte b, ByteSet excludedBytes, int[] result)
b
- Byte whose interval is calcuated.excludedBytes
- Set of bytes to exclude from counts.result
- Array in which to write the range for the specified byte.void intervalEscape(ByteSet excludedBytes, int[] result)
excludedBytes
- Set of bytes to exclude from counts.result
- Array into which to write the range for the specified bytes.void increment(ByteBuffer buffer)
buffer
- Buffer of bytes from which to read event to increment.boolean hasDaughter(int i)
true
if this node has a child with the specified byte, specified
as an integer.
b
- Byte coded as integer to check.
private boolean hasDaughter(byte b)
true
if this node has a child with the specified byte.
b
- Byte to check.
int pointToSymbol(int midCount, ByteSet excludedBytes)
midCount
- Count for which to find symbol.excludedBytes
- Set of bytes to exclude from counts.
void complete(byte[] bytes, int offset, int length)
bytes
- Byte array providing bytes to extend.offset
- Index of first byte in array.length
- Number of bytes to extend.void increment(byte[] bytes, int offset, int length)
bytes
- Array from which to read bytes.offset
- Index of first byte to read from array.length
- Total number of bytes to read from array.public PPMNode prune()
null
if
the node's count is too low and pruning all children with counts
too low.
private PPMNode pruneSiblings()
null
if there aren't any.
private void rescale()
1
to allow
possiblity for escapes.
private PPMNode rescaleSiblings()
null
if siblings scale below.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |