pt.tumba.ngram.compression
Class Test

java.lang.Object
  extended by pt.tumba.ngram.compression.Test

public final class Test
extends java.lang.Object

Runs test suite for arithmetic coding and decoding with all of th esupplied compression models from main(java.lang.String[]). Behavior is specified with the following arguments.

The Calgary corpus can be downloaded from:

ftp://ftp.cpsc.ucalgary.ca/pub/projects/text.compression.corpus .

Because of the use of statics, only a single test should be run per virtual machine.

Author:
Bruno Martins
See Also:
ArithCodeModel, ArithCodeInputStream, ArithCodeOutputStream, AdaptiveUnigramModel, UniformModel, PPMModel

Field Summary
private static TestSet _testSet
          Creates the test set to use for the tests.
 
Constructor Summary
private Test()
          Hide unused constructor.
 
Method Summary
private static java.lang.String compressionRateString(int numBytesIn, int numBytesOut)
          Returns a string representation of the compression rate indicated by the specified number of original bytes and compressed bytes.
(package private) static void copyStream(java.io.InputStream in, java.io.OutputStream out)
          Read all of the input from the given input stream and write it to the given output stream.
(package private) static long elapsed(long start)
          Return elapsed time since specified time in milliseconds (1/1000 second).
private static java.lang.String intToString(int n, int minLength)
          Convert an integer to a string, padding with spaces in the front to provide a result of the specified minimum length.
static void main(java.lang.String[] args)
          Runs test suite as specified by arguments.
private static byte nextByteRange(java.util.Random r, int low, int high)
          Generates the next random byte between the specified low and high bytes inclusive, using the specified randomizer.
private static void nextRandomAlphaNum(byte[] bs, java.util.Random r)
          Fills the specified byte array with random alphanumeric characters generated by the specified randomizer.
private static byte nextRandomAlphaNum(java.util.Random r)
          Returns next random alphabetic or numeric byte as determined by the specified randomizer.
private static java.lang.String speedString(int numBytes, long numMillis)
          Returns a string representation of the speed of compression indicated by the specified number of original bytes and time in milliseconds.
private static boolean test(byte[] bytes)
          Tests compression/decompression of a given sequence of bytes.
private static boolean test(java.io.File file)
          Tests compression/decompression of a given file.
private static boolean test(java.lang.String text)
          Tests compression/decompression of a given string.
private static boolean testBytes(byte[] bytes)
          Tests given sequence of bytes against various models.
private static boolean testBytes(byte[] bytes, ArithCodeModel modelIn, ArithCodeModel modelOut, java.lang.String name)
          Tests specified sequence of bytes with specified models for input and output, and specified name.
private static boolean testBytesGZIP(byte[] bytes, java.lang.String name)
          Tests specified sequence of bytes with Zip compression and specified name.
private static void testCalgary(java.lang.String path)
          Runs a test on the Calgary corpus.
private static void testFixed()
          Fixed test suite.
private static boolean testPPMBytes(byte[] bytes, int order)
          Run a test of PPM on the specified bytes using a model of the specified order.
private static void testSize(int size)
          Runs tests from 1 to give size, increasing size by a factor of two at each step.
private static void testXML(java.lang.String path)
          Runs a test on James Cheney's XML corpus.
(package private) static java.lang.String timeToSeconds(long t)
          Convert specified time in milliseconds to a string in seconds.
private static java.lang.String trim(java.lang.String in)
          Truncates string to printable length, appending epenthetic dots if it is truncated.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

_testSet

private static TestSet _testSet
Creates the test set to use for the tests.

Constructor Detail

Test

private Test()
Hide unused constructor.

Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException
Runs test suite as specified by arguments.

Parameters:
args - Parameters in fixed order.
Throws:
java.io.IOException - If there is an underlying I/O exception during compression/decompression.

copyStream

static void copyStream(java.io.InputStream in,
                       java.io.OutputStream out)
                throws java.io.IOException
Read all of the input from the given input stream and write it to the given output stream.

Parameters:
in - Input stream from which to read.
out - Output stream to which to write.
Throws:
java.io.IOException - If there is an exception reading or writing on the given streams.

elapsed

static long elapsed(long start)
Return elapsed time since specified time in milliseconds (1/1000 second).

Parameters:
start - Time from which to measure.
Returns:
Time since start time in milliseconds.

timeToSeconds

static java.lang.String timeToSeconds(long t)
Convert specified time in milliseconds to a string in seconds.

Parameters:
t - Time to convert to a string.
Returns:
String representation of specified time.

testSize

private static void testSize(int size)
                      throws java.io.IOException
Runs tests from 1 to give size, increasing size by a factor of two at each step. For each size, a test is made of a constant string consisting of repetitions of a single character, and a test of a random sequence of letters and then a random sequence of bytes.

Parameters:
size - Maximum size up to which to test.
Throws:
java.io.IOException - If there is an underlying I/O exception during compression/decompression.

testFixed

private static void testFixed()
                       throws java.io.IOException
Fixed test suite.

Throws:
java.io.IOException - If there is an underlying I/O exception during compression/decompression.

testXML

private static void testXML(java.lang.String path)
                     throws java.io.IOException
Runs a test on James Cheney's XML corpus.

Parameters:
path - Name of directory in which to find the Calgary corpus.
Throws:
java.io.IOException - If there is an underlying I/O exception during compression/decompression.

testCalgary

private static void testCalgary(java.lang.String path)
                         throws java.io.IOException
Runs a test on the Calgary corpus.

Parameters:
path - Name of directory in which to find the Calgary corpus.
Throws:
java.io.IOException - If there is an underlying I/O exception during compression/decompression.

test

private static boolean test(java.io.File file)
                     throws java.io.IOException
Tests compression/decompression of a given file.

Parameters:
file - File to test.
Returns:
true if the test succeeds.
Throws:
java.io.IOException - If there is an underlying I/O exception.

test

private static boolean test(java.lang.String text)
                     throws java.io.IOException
Tests compression/decompression of a given string. String is first rendered as bytes, given current localized default; see String.getBytes(int, int, byte[], int).

Parameters:
text - String to test for compression/decompression.
Returns:
true if the test succeeds.
Throws:
java.io.IOException - If there is an underlying I/O exception.

test

private static boolean test(byte[] bytes)
                     throws java.io.IOException
Tests compression/decompression of a given sequence of bytes.

Parameters:
bytes - Bytes to test for compression/decompression.
Returns:
true if the test succeeds.
Throws:
java.io.IOException - If there is an underlying I/O exception.

testPPMBytes

private static boolean testPPMBytes(byte[] bytes,
                                    int order)
                             throws java.io.IOException
Run a test of PPM on the specified bytes using a model of the specified order.

Parameters:
bytes - Bytes to test.
order - Order of PPM model to use.
Returns:
true if the test is successful.
Throws:
java.io.IOException

testBytes

private static boolean testBytes(byte[] bytes)
                          throws java.io.IOException
Tests given sequence of bytes against various models.

Parameters:
bytes - Bytes to test for compression/decompression.
Returns:
true if the test succeeds.
Throws:
java.io.IOException - If there is an underlying I/O exception.

testBytes

private static boolean testBytes(byte[] bytes,
                                 ArithCodeModel modelIn,
                                 ArithCodeModel modelOut,
                                 java.lang.String name)
                          throws java.io.IOException
Tests specified sequence of bytes with specified models for input and output, and specified name.

Parameters:
bytes - Bytest to test.
modelIn - Model to use for encoding.
modelOut - Model to use for decoding.
name - Name ot use for display.
Returns:
true if the test succeeds.
Throws:
java.io.IOException - If there is an underlying I/O exception.

testBytesGZIP

private static boolean testBytesGZIP(byte[] bytes,
                                     java.lang.String name)
                              throws java.io.IOException
Tests specified sequence of bytes with Zip compression and specified name.

Parameters:
bytes - Bytest to test.
modelIn - Model to use for encoding.
modelOut - Model to use for decoding.
name - Name ot use for display.
Returns:
true if the test succeeds.
Throws:
java.io.IOException - If there is an underlying I/O exception.

compressionRateString

private static java.lang.String compressionRateString(int numBytesIn,
                                                      int numBytesOut)
Returns a string representation of the compression rate indicated by the specified number of original bytes and compressed bytes. Expressed in bits per byte.

Parameters:
numOriginalBytes - Number of uncompressed bytes.
numCompressedBytes - Number of bytes in the compressed file.
Returns:
String representation of compression rate.

intToString

private static java.lang.String intToString(int n,
                                            int minLength)
Convert an integer to a string, padding with spaces in the front to provide a result of the specified minimum length.

Parameters:
n - Integer to convert to string.
minLength - Minimum length of result.
Returns:
String representation of integer, padded to at least specified length.

speedString

private static java.lang.String speedString(int numBytes,
                                            long numMillis)
Returns a string representation of the speed of compression indicated by the specified number of original bytes and time in milliseconds.

Parameters:
numBytes - Number of uncompressed bytes.
numMillis - Number of milliseconds.
Returns:
String representation of number of bytes per millisecond.

trim

private static java.lang.String trim(java.lang.String in)
Truncates string to printable length, appending epenthetic dots if it is truncated.

Parameters:
in - String to truncate.
Returns:
Truncated string.

nextRandomAlphaNum

private static void nextRandomAlphaNum(byte[] bs,
                                       java.util.Random r)
Fills the specified byte array with random alphanumeric characters generated by the specified randomizer.

Parameters:
bs - Byte array to fill.
r - Randomizer.

nextByteRange

private static byte nextByteRange(java.util.Random r,
                                  int low,
                                  int high)
Generates the next random byte between the specified low and high bytes inclusive, using the specified randomizer.

Parameters:
r - Randomizer.
low - Low end of byte range, inclusive.
high - High end of byte range, inclusive.
Returns:
Random byte in low to high range.

nextRandomAlphaNum

private static byte nextRandomAlphaNum(java.util.Random r)
Returns next random alphabetic or numeric byte as determined by the specified randomizer.

Parameters:
r - Randomizer.
Returns:
Next random alpha-numeric byte.