|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object pt.tumba.ngram.NGram
public class NGram
This class models a concrete and simple N-Gram. To make it slightly more interesting (and efficient), the class follows a Flyweight pattern, so that for each different N-Gram there will only be one instance in the System.
Field Summary | |
---|---|
protected byte[] |
bytes
Array of bytes storing the N-gram. |
protected double |
count
Number of occurences of this NGram. |
static NGram[] |
known
An array with the known N-grams. |
protected static int |
knownCount
Number of N-grams in the array of known N-grams. |
protected static int |
knownStep
Empty space to leave each time we have to increment the cache. |
protected int |
size
Size of this N-gram. |
protected static boolean |
useCache
Boolean flag indicating the use of the N-gram cache. |
Constructor Summary | |
---|---|
protected |
NGram()
Constructor for the NGram object. |
|
NGram(byte[] bytes,
int start,
int length)
Constructor for the NGram object |
|
NGram(byte[] bytes,
int start,
int length,
double count)
Constructor for the NGram object |
|
NGram(NGram ng)
Constructor for the NGram object which copies another N-gram. |
|
NGram(java.lang.String str)
Constructor for the NGram object |
Method Summary | |
---|---|
private static int |
code(byte[] bytes,
int start,
int length)
Encode a byte sequence. |
int |
compareTo(java.lang.Object e1)
Compares the number of occurences of this N-gram with another. |
boolean |
equals(byte[] bytes,
int start,
int length)
Compares this N-gram with another one supplied as an array of bytes. |
boolean |
equals(java.lang.Object e1)
Compares this N-gram with another Object (checking if its an N-gram object being compared). |
int |
getByte(int pos)
Return a single byte out of the NGram. |
int |
getCount()
Returns the number of occurences of this N-gram. |
static int |
getNGramCount()
Gets the number of different N-Grams. |
int |
getSize()
Return the size of this NGram. |
double |
getSmoothedCount()
Returns the number of occurences of this N-gram, using Good-Turing smoothing. |
java.lang.String |
getString()
Returns a String representation of this NGram. |
int |
hashCode()
Override the hashCode, allowing to hash NGrams against tiny byte sequences. |
void |
inc()
Increments the number of occurences of this N-gram. |
static NGram |
newNGram(byte[] bytes)
QuasiConstructor. |
static NGram |
newNGram(byte[] bytes,
int start)
QuasiConstructor. |
static NGram |
newNGram(byte[] bytes,
int start,
int length)
QuasiConstructor. |
static NGram |
newNGram(byte[] bytes,
int start,
int length,
double count)
QuasiConstructor. |
static NGram |
newNGram(java.lang.String str)
QuasiConstructor. |
java.lang.String |
toString()
Returns a String representation of this NGram, where occurence
frequency information is also present. |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static NGram[] known
protected static int knownCount
protected static int knownStep
protected static boolean useCache
protected byte[] bytes
protected int size
protected double count
Constructor Detail |
---|
protected NGram()
public NGram(byte[] bytes, int start, int length, double count)
bytes
- An array of bytes with the N-gram.start
- Starting position in the array of bytes.length
- Ending position in the array of bytes.count
- Occurence frequency for this NGram.public NGram(byte[] bytes, int start, int length)
bytes
- An array of bytes with the N-gram.start
- Starting position in the array of bytes.length
- Ending position in the array of bytes.public NGram(NGram ng)
ng
- An N-gram.public NGram(java.lang.String str)
str
- A string with the N-gram.Method Detail |
---|
private static int code(byte[] bytes, int start, int length)
bytes
- An array of bytes.start
- Starting position in the array of bytes.length
- Ending position in the array of bytes.
public static int getNGramCount()
public static NGram newNGram(byte[] bytes)
bytes
- Sequence of bytes with the N-gram
public static NGram newNGram(byte[] bytes, int start)
bytes
- Sequence of bytes with the N-gramstart
- Starting position in the sequence of bytes.
public static NGram newNGram(java.lang.String str)
str
- A string with the N-gram.
public static NGram newNGram(byte[] bytes, int start, int length)
bytes
- Sequence of bytes with the N-gramstart
- Starting position in the sequence of bytes.length
- Ending position in the sequence of bytes.
public static NGram newNGram(byte[] bytes, int start, int length, double count)
bytes
- Sequence of bytes with the N-gramstart
- Starting position in the sequence of bytes.length
- Ending position in the sequence of bytes.count
- Occurence frequency for this NGram.
public boolean equals(byte[] bytes, int start, int length)
bytes
- An array of bytes with an N-gram.start
- Starting position in the array of bytes.length
- Ending position in the array of bytes.
public boolean equals(java.lang.Object e1)
equals
in class java.lang.Object
e1
- An object.
public int getByte(int pos)
pos
- Return the 1st, 2nd, 3rd, ... byte.
ArrayIndexOutOfBoundException
- The NGram does not contain the given position.public int getSize()
public java.lang.String getString()
String
representation of this NGram.
String
representation of this NGram.public int hashCode()
hashCode
in class java.lang.Object
public java.lang.String toString()
String
representation of this NGram, where occurence
frequency information is also present.
toString
in class java.lang.Object
String
representation of this NGram.public int compareTo(java.lang.Object e1)
compareTo
in interface java.lang.Comparable
e1
- An object (must be an instance of NGram)
java.lang.NullPointerException
- if e1 is null.
java.lang.ClassCastException
- if e1 is not an NGram object.public int getCount()
public double getSmoothedCount()
public void inc()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |