|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object pt.tumba.ngram.compression.PPMModel
public final class PPMModel
Provides a cumulative, adaptive byte model implementing
prediction by partial matching up to a specified maximum context size.
Uses Method C for estimation.
Constants that control behavior include the maximum total count before
rescaling, and the minimum count to retain after rescaling (an escape
is always maintained with a count of at least 1
).
Field Summary | |
---|---|
private ExcludingAdaptiveUnigramModel |
_backoffModel
Model to use for short contexts. |
private ByteBuffer |
_buffer
Bytes buffered for use as context. |
private int |
_byteCount
Count of bytes coded to use in pruning. |
private int |
_contextLength
Current context length. |
private PPMNode |
_contextNode
Current context node. |
private PPMNode[] |
_contexts
Nodes at depth 1 in the model. |
private ByteSet |
_excludedBytes
Storage for the excluded bytes |
private int |
_maxContextLength
Maximum context length to search in trie. |
private PPMNode |
_rootNode
Root of the trie structure of counts. |
private static int |
MIN_CONTEXT_LENGTH
Minimum context length to look down sequence of nodes. |
private static int |
PRUNE_INTERVAL
Period between prunings in number of bytes. |
Fields inherited from interface pt.tumba.ngram.compression.ArithCodeModel |
---|
EOF, ESCAPE |
Constructor Summary | |
---|---|
PPMModel(int maxContextLength)
Construct a new model with the specified maximum length of context to use for prediction. |
Method Summary | |
---|---|
boolean |
escaped(int symbol)
Returns true if current context has no count
interval for given symbol. |
void |
exclude(ByteSet bytesToExclude)
Exclude all of the bytes in the specified byte set. |
void |
exclude(int i)
Excludes outcome from occurring in next estimate. |
private void |
getContextNodeBinarySearch()
Use binary search to set the context node up to the currently specified context length. |
private void |
getContextNodeLongToShort()
Starting at the longest context, count down in length to set a valid context or give up. |
private void |
increment(byte b)
Adds counts for given byte to model in current context and then updates the current context. |
void |
increment(int i)
Increments the model as if it had just encoded or decoded the specified symbol in the stream. |
void |
interval(int symbol,
int[] result)
Calculates {low count, high count, total count} for
the given symbol in the current context. |
private void |
intervalByte(int i,
int[] result)
Returns interval for byte specified as an integer in 0 to 255 range. |
private void |
intervalEscape(int[] result)
Returns interval for escape in current context. |
private static PPMNode |
lookup(PPMNode node,
byte[] bytes,
int offset,
int length)
Looks up a node from the given bytes, offset and length starting from the specified node. |
private PPMNode |
lookupNode(int contextLength)
Returns node from the current byte buffer of the specified context length, or null if there isn't one. |
int |
pointToSymbol(int count)
Returns the symbol whose interval of low and high counts contains the given count. |
private void |
prune()
Method used for pruning (edited out). |
int |
totalCount()
Returns the total count for the current context. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private int _byteCount
private final ExcludingAdaptiveUnigramModel _backoffModel
private final PPMNode[] _contexts
private final int _maxContextLength
private final PPMNode _rootNode
private int _contextLength
private PPMNode _contextNode
private final ByteBuffer _buffer
private final ByteSet _excludedBytes
private static final int MIN_CONTEXT_LENGTH
private static final int PRUNE_INTERVAL
Constructor Detail |
---|
public PPMModel(int maxContextLength)
maxContextLength
- Maximum length of context to use for prediction.Method Detail |
---|
public boolean escaped(int symbol)
true
if current context has no count
interval for given symbol. Successive calls to
escaped(symbol)
followed by
interval(ESCAPE)
must eventually lead to a a
false
return from escaped(symbol)
after a number of calls equal to the maximum context size.
The integer representation of symbol is as in interval
.
escaped
in interface ArithCodeModel
symbol
- Symbol to test whether it is encoded.
true
if given symbol is not represented in the current context.public void exclude(int i)
exclude
in interface ArithCodeModel
symbol
- Symbol which can be excluded from the next outcome.public void interval(int symbol, int[] result)
{low count, high count, total count}
for
the given symbol in the current context. The symbol is either
an integer representation of a byte (0-255) or -1 to denote end-of-file.
The cumulative counts
in the return must be such that 0 <= low count < high
count <= total count
.
This method will be called exactly once for each symbol being
encoded or decoded, and the calls will be made in the order in
which they appear in the original file. Adaptive models
may only update their state to account for seeing a symbol
interval
in interface ArithCodeModel
symbol
- The next symbol to decode.result
- Array into which to write range.public int pointToSymbol(int count)
EOF
or
ESCAPE
, which are negative.
pointToSymbol
in interface ArithCodeModel
count
- The given count.
public int totalCount()
totalCount
in interface ArithCodeModel
public void increment(int i)
increment
in interface ArithCodeModel
symbol
- Symbol to add to the model.public void exclude(ByteSet bytesToExclude)
bytesToExclude
- Set of bytes to exclude from outcome.private void intervalByte(int i, int[] result)
i
- Integer specification of byte in 0 to 255 range.result
- Array specifying cumulative probability for byte i.private void intervalEscape(int[] result)
result
- Array for specifying cumulative probability for escape symbol in current context.private void prune()
private void increment(byte b)
b
- Byte to add to model.private void getContextNodeBinarySearch()
null
if
not found.
private void getContextNodeLongToShort()
private PPMNode lookupNode(int contextLength)
contextLength
- Number of bytes of context used.
private static PPMNode lookup(PPMNode node, byte[] bytes, int offset, int length)
node
- Node from which to search.bytes
- Sequence of bytes to search.offset
- Offset into sequence of bytes of the first byte.length
- Number of bytes to look up.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |