pt.tumba.ngram
Class EntryProfile

java.lang.Object
  extended by pt.tumba.ngram.EntryProfile
All Implemented Interfaces:
Profile

public class EntryProfile
extends java.lang.Object
implements Profile

A Profile stores N-gram frequency information for a given textual string. This is a profile implementation which builds itself from an input text.

Author:
Bruno Martins

Field Summary
protected  java.util.Map gramRanks
          A Map storing N-grams and the associated ranking position.
protected  java.util.Map gramsStrings
          A Map storing the textual String composing the N-grams.
protected  java.util.Map gramWeights
          A Map storing N-grams and the associated weights.
protected  int theLimit
          The lowest ranking position for storage in the N-gram profile.
protected  int theLowerLimit
          The highest ranking position for storage in the N-gram profile.
 
Constructor Summary
EntryProfile(java.io.InputStream stream)
          Constructor for the EntryProfile object.
EntryProfile(java.io.InputStream stream, int theLimit)
          Constructor for the EntryProfile object.
EntryProfile(java.io.InputStream stream, int theLimit, int theLowerLimit)
          Constructor for the EntryProfile object.
EntryProfile(java.lang.String fname)
          Constructor for the EntryProfile object.
EntryProfile(java.lang.String fname, int theLimit)
          Constructor for the EntryProfile object.
EntryProfile(java.lang.String fname, int theLimit, int theLowerLimit)
          Constructor for the EntryProfile object.
 
Method Summary
private  void digestStream(java.io.InputStream stream)
          Build tbe profile from an InputStream
 double getRank(NGram ng)
          Gets the ranking position of a given N-gram.
 double getRank(java.lang.String ng)
          Gets the ranking position of a given N-gram.
 double getWeight(NGram ng)
          Gets the weighting score of a given N-gram.
 java.util.Iterator ngrams()
          Returns an Iterator over the N-grams in this profile.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

gramRanks

protected java.util.Map gramRanks
A Map storing N-grams and the associated ranking position.


gramWeights

protected java.util.Map gramWeights
A Map storing N-grams and the associated weights.


gramsStrings

protected java.util.Map gramsStrings
A Map storing the textual String composing the N-grams.


theLimit

protected int theLimit
The lowest ranking position for storage in the N-gram profile. For instance with theLimit=400 only the top 400 highest occurring N-grams will be stored.


theLowerLimit

protected int theLowerLimit
The highest ranking position for storage in the N-gram profile. For instance with theLowerLimit=200 the top 200 highest occurring N-grams will be skipped.

Constructor Detail

EntryProfile

public EntryProfile(java.io.InputStream stream)
             throws java.io.IOException
Constructor for the EntryProfile object.

Parameters:
stream - An InputStream from where to read the text, in order to build the profile.
Throws:
java.io.IOException - A problem occured while reading from the InputStream.

EntryProfile

public EntryProfile(java.io.InputStream stream,
                    int theLimit)
             throws java.io.IOException
Constructor for the EntryProfile object.

Parameters:
stream - An InputStream from where to read the text, in order to build the profile.
theLimit - The lowest ranking position for storage in the N-gram profile.
Throws:
java.io.IOException - A problem occured while reading from the InputStream.

EntryProfile

public EntryProfile(java.io.InputStream stream,
                    int theLimit,
                    int theLowerLimit)
             throws java.io.IOException
Constructor for the EntryProfile object.

Parameters:
stream - An InputStream from where to read the text, in order to build the profile.
theLimit - The lowest ranking position for storage in the N-gram profile.
theLowerLimit - The highest ranking position for storage in the N-gram profile.
Throws:
java.io.IOException - A problem occured while reading from the InputStream.

EntryProfile

public EntryProfile(java.lang.String fname)
             throws java.io.IOException,
                    java.io.FileNotFoundException
Constructor for the EntryProfile object.

Parameters:
fname - The pathname to the File with the text used to build the profile.
Throws:
java.io.IOException - A problem occured while reading from the file.
java.io.FileNotFoundException - A problem occured while reading from the file.

EntryProfile

public EntryProfile(java.lang.String fname,
                    int theLimit)
             throws java.io.IOException,
                    java.io.FileNotFoundException
Constructor for the EntryProfile object.

Parameters:
fname - The pathname to the File with the text used to build the profile.
theLimit - The lowest ranking position for storage in the N-gram profile.
Throws:
java.io.IOException - A problem occured while reading from the file.
java.io.FileNotFoundException - A problem occured while reading from the file.

EntryProfile

public EntryProfile(java.lang.String fname,
                    int theLimit,
                    int theLowerLimit)
             throws java.io.IOException,
                    java.io.FileNotFoundException
Constructor for the EntryProfile object.

Parameters:
fname - The pathname to the File with the text used to build the profile.
theLimit - The lowest ranking position for storage in the N-gram profile.
theLowerLimit - The highest ranking position for storage in the N-gram profile.
Throws:
java.io.IOException - A problem occured while reading from the file.
java.io.FileNotFoundException - A problem occured while reading from the file.
Method Detail

digestStream

private final void digestStream(java.io.InputStream stream)
                         throws java.io.IOException
Build tbe profile from an InputStream

Parameters:
stream - An InputStream from where to read the text, in order to build the profile.
Throws:
java.io.IOException - A problem occured while reading from the InputStream.

getRank

public double getRank(NGram ng)
Gets the ranking position of a given N-gram.

Specified by:
getRank in interface Profile
Parameters:
ng - An N-Gram
Returns:
The associated ranking position.

getWeight

public double getWeight(NGram ng)
Gets the weighting score of a given N-gram.

Specified by:
getWeight in interface Profile
Parameters:
ng - An N-Gram
Returns:
The associated occurence frequency.

getRank

public double getRank(java.lang.String ng)
Gets the ranking position of a given N-gram.

Parameters:
ng - A String with the characters of the N-Gram
Returns:
The associated ranking position.

ngrams

public java.util.Iterator ngrams()
Returns an Iterator over the N-grams in this profile.

Specified by:
ngrams in interface Profile
Returns:
An Iterator over the N-grams in this profile.