|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object pt.tumba.ngram.blr.BayesianLogReg
public class BayesianLogReg
Simple, easy-to-use, and efficient software for Bayesian Logistic Regression classification, based on the "Bayesian Logistic Regression Software" package by Alexander Genkin, David D. Lewis, and David Madigan. A "one-against-one" approach is used for multiclass classification.
A general binary regression classifier takes the form:
P(y=1|x,beta) = exp(beta*x) / ( 1 + exp(beta*x) ) where y is the class label (1 or -1), x is the predictor vector and beta is the vector of parameters.
This software finds the maximum a posteriori parameter estimates with two choices for prior: Gaussian or Laplace (The Laplace prior corresponds to Tibshirani's LASSO algorithm). To find the parameter estimates the software implements a coordinate descent algorithm that draws on the ideas of Zhang and Oles (2001). There are two ways for the user to define the hyperparameter value (laplace prior or gaussian variance): The first way is to specify the hyperparameter value explicitly. The second way is to omit any specification and allow the program to set the value by default. The program sets the default prior variance equal to the inverse average squared value of all data elements in training.
Logistic regression estimates the probability that a data vector belongs to the class with label 1. Classification requires a threshold: the model assigns a case to class 1 iff the probability estimate is greater or equal to the threshold value. The program offers the following choices for threshold tuning criteria:
Field Summary | |
---|---|
private double[] |
beta
|
private double[] |
classes
|
private double[][] |
data
The training points |
private double[] |
delta
|
protected java.lang.String[] |
names
|
static java.io.FilenameFilter |
NGramFilter
A FilenameFilter for filtering directory listings, recognizing
filenames for class profiles. |
private double[] |
r
|
protected java.util.List |
sortedGrams
|
private double[] |
theta
|
Constructor Summary | |
---|---|
BayesianLogReg()
Construct an uninitialized Cathegorizer. |
|
BayesianLogReg(java.lang.String dirName)
Construct an Cathegorizer from a whole Directory of resources. |
|
BayesianLogReg(java.lang.String[] fileNames)
Construct an Cathegorizer from a List of resource file names. |
Method Summary | |
---|---|
private double |
convergenceTest(double[] deltar)
|
private static java.util.List |
exchangePos(java.util.List v,
int p1,
int p2)
Exchange two values in a list |
private double |
gaussianOptimization(int j)
|
private void |
init(java.io.File fi,
java.lang.String[] names)
Fetch the set of profiles from the disk. |
private void |
initialize()
Initialize the Bayesian Logistic Regression classifyer. |
private double |
laplaceOptimization(int j)
|
static void |
main(java.lang.String[] args)
Sample application to use the Cathegorizer from the command line. |
java.lang.String |
match(java.io.File f)
Match a given File against all the classes in the cathegorizer. |
private void |
optimization()
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private double[][] data
private double[] classes
private double[] beta
private double[] delta
private double[] theta
private double[] r
protected java.lang.String[] names
protected java.util.List sortedGrams
public static java.io.FilenameFilter NGramFilter
FilenameFilter
for filtering directory listings, recognizing
filenames for class profiles. Essentially, all filenames not ending with a ".corpus"
extension are valid.
Constructor Detail |
---|
public BayesianLogReg()
public BayesianLogReg(java.lang.String dirName) throws TCatNGException, java.io.FileNotFoundException
dirName
- Pathname for the directory with the profiles.
TCatNGException
- A problem occured while reading the profiles.
java.io.FileNotFoundException
- The pathname was not found.public BayesianLogReg(java.lang.String[] fileNames) throws TCatNGException, java.io.FileNotFoundException
fileNames
- An array with the pathnames for the profiles.
TCatNGException
- A problem occured while reading the profiles.
java.io.FileNotFoundException
- One of the pathnames was not found.Method Detail |
---|
private final void init(java.io.File fi, java.lang.String[] names) throws TCatNGException, java.io.FileNotFoundException
fi
- Base directory for the profiles.names
- Filenames of the profiles to fetch.
TCatNGException
- A problem occured while reading the profiles.
java.io.FileNotFoundException
- One of the pathnames was not found.private static java.util.List exchangePos(java.util.List v, int p1, int p2)
v
- The original listp1
- The index of the first elementp2
- The index of the second element
private void initialize()
private double convergenceTest(double[] deltar)
deltar
-
private double gaussianOptimization(int j)
j
-
private double laplaceOptimization(int j)
j
-
private void optimization()
public java.lang.String match(java.io.File f)
File
against all the classes in the cathegorizer.
f
- A File
.
public static void main(java.lang.String[] args)
args
- The command line arguments, tokenized
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |