pt.tumba.ngram.compression
Interface ArithCodeModel

All Known Implementing Classes:
AdaptiveUnigramModel, PPMModel, UniformModel

public interface ArithCodeModel

Interface for an adaptive statistical model of a stream to be used as a basis for arithmetic coding and decoding. As in InputStream, bytes are coded as integers in the range 0 to 255 and EOF is provided as a constant and coded as -1. In addition, arithmetic coding requires an integer ESCAPE to code information about the model structure.

During encoding, a series of calls will be made to escaped(symbol) where symbol is a byte encoded as an integer in the range 0 to 255 or EOF, and if the result is true, a call to interval(ESCAPE) will be made and the process repeated until a call to escaped(symbol) returns false, at which point a call to interval(symbol) is made and the underlying model is updated.

During decoding, a call to total() will be made and then a call to pointToSymbol(count). If the result is ESCAPE, the process is repeated. If the result is a byte encoded as an integer in the range 0 to 255 or EOF, the symbol is returned and the underlying model is updated.

The probability model required for arithmetic coding is cumulative. For each outcome, rather than returning a probability, an interval is provided to the coder. As is usual for arithmetic coding, an interval in [0,1] is represented by three integers, where a low count, a high count, and a total count pick out the interval [low/total,high/total).

Author:
Bruno Martins
See Also:
ArithCodeInputStream, ArithCodeOutputStream

Field Summary
static int EOF
          Symbol denoting end-of-file.
static int ESCAPE
          Symbol denoting an escape, meaning that the outcome symbol has no interval in the current context.
 
Method Summary
 boolean escaped(int symbol)
          Returns true if current context has no count interval for given symbol.
 void exclude(int symbol)
          Excludes outcome from occurring in next estimate.
 void increment(int symbol)
          Increments the model as if it had just encoded or decoded the specified symbol in the stream.
 void interval(int symbol, int[] result)
          Calculates {low count, high count, total count} for the given symbol in the current context.
 int pointToSymbol(int count)
          Returns the symbol whose interval of low and high counts contains the given count.
 int totalCount()
          Returns the total count for the current context.
 

Field Detail

EOF

static final int EOF
Symbol denoting end-of-file. Guaranteed to be negative.

See Also:
Constant Field Values

ESCAPE

static final int ESCAPE
Symbol denoting an escape, meaning that the outcome symbol has no interval in the current context. Guaranteed to be negative.

See Also:
Constant Field Values
Method Detail

totalCount

int totalCount()
Returns the total count for the current context.

Returns:
Total count for the current context.

pointToSymbol

int pointToSymbol(int count)
Returns the symbol whose interval of low and high counts contains the given count. Ordinary outcomes are positive integers, and the two special constants EOF or ESCAPE, which are negative.

Parameters:
count - The given count.
Returns:
The symbol whose interval contains the given count.

interval

void interval(int symbol,
              int[] result)
Calculates {low count, high count, total count} for the given symbol in the current context. The symbol is either an integer representation of a byte (0-255) or -1 to denote end-of-file. The cumulative counts in the return must be such that 0 <= low count < high count <= total count.

This method will be called exactly once for each symbol being encoded or decoded, and the calls will be made in the order in which they appear in the original file. Adaptive models may only update their state to account for seeing a symbol after returning its current interval.

Parameters:
symbol - The next symbol to decode.
result - Array into which to write range.

escaped

boolean escaped(int symbol)
Returns true if current context has no count interval for given symbol. Successive calls to escaped(symbol) followed by interval(ESCAPE) must eventually lead to a a false return from escaped(symbol) after a number of calls equal to the maximum context size. The integer representation of symbol is as in interval.

Parameters:
symbol - Symbol to test whether it is encoded.
Returns:
true if given symbol is not represented in the current context.

exclude

void exclude(int symbol)
Excludes outcome from occurring in next estimate. A symbol must not be excluded and then coded or decoded. Exclusions in the model must be coordinated for encoding and decoding.

Parameters:
symbol - Symbol which can be excluded from the next outcome.

increment

void increment(int symbol)
Increments the model as if it had just encoded or decoded the specified symbol in the stream. May be used to prime models by "injecting" a symbol into the model's stream without coding/decoding it in the stream of coded bytes. Calls must be coordinated for encoding and decoding. Will be called automatically by the models for symbols they encode or decode.

Parameters:
symbol - Symbol to add to the model.