pt.tumba.ngram.svm
Class SVM

java.lang.Object
  extended by pt.tumba.ngram.svm.SVM

public class SVM
extends java.lang.Object

Construct and solve various formulations of the support vector machine (SVM) problem.

A support vector machine is a supervised learning algorithm developed over the past decade by Vapnik and others (Vapnik, Statistical Learning Theory, 1998). The algorithm addresses the general problem of learning to discriminate between positive and negative members of a given class of n-dimensional vectors.

The SVM algorithm operates by mapping the given training set into a possibly high-dimensional feature space and attempting to locate in that space a plane that separates the positive from the negative examples. Having found such a plane, the SVM can then predict the classification of an unlabeled example by mapping it into the feature space and asking on which side of the separating plane the example lies. Much of the SVM's power comes from its criterion for selecting a separating plane when many candidates planes exist: the SVM chooses the plane that maintains a maximum margin from any point in the training set. Statistical learning theory suggests that, for some classes of well-behaved data, the choice of the maximum margin hyperplane will lead to maximal generalization when predicting the classification of previously unseen examples (Vapnik, Statistical Learning Theory, 1998). The SVM algorithm can also be extended to cope with noise in the training set and with multiple classes (Cristianini and Shawe-Taylor, An Introduction to Support Vector Machines, 2000).

Say that we have a training data set containing n examples, each of which is a vector of m numbers. These vectors may be thought of as points in an m-dimensional space. In theory, a simple way to build a binary classifier is to construct a hyperplane (i.e., a plane in a space with more than three dimensions) separating class members (positive examples) from non-members (negative examples) in this space. Unfortunately, most real-world problems involve non-separable data for which there does not exist a hyperplane that successfully separates the positive from the negative examples. One solution to the inseparability problem is to map the data into a higher-dimensional space and define a separating hyperplane there. This higher-dimensional space is called the feature space, as opposed to the input space occupied by the training examples. With an appropriately chosen feature space of sufficient dimensionality, any consistent training set can be made separable. However, translating the training set into a higher-dimensional space incurs both computational and learning-theoretic costs. Furthermore, artificially separating the data in this way exposes the learning system to the risk of finding trivial solutions that overfit the data.

SVMs elegantly sidestep both difficulties. They avoid overfitting by choosing the maximum margin separating hyperplane from among the many that can separate the positive from negative examples in the feature space. Also, the decision function for classifying points with respect to the hyperplane only involves dot products between points in the feature space. Because the algorithm that finds a separating hyperplane in the feature space can be stated entirely in terms of vectors in the input space and dot products in the feature space, a support vector machine can locate the hyperplane without ever representing the space explicitly, simply by defining a function, called a kernel function, that plays the role of the dot product in the feature space. This technique avoids the computational burden of explicitly representing the feature vectors.

For some data sets, the SVM may not be able to find a separating hyperplane in feature space, either because the kernel function is inappropriate for the training data or because the data contains mislabeled examples. The latter problem can be addressed by using a soft margin that allows some training examples to fall on the wrong side of the separating hyperplane. Completely specifying a support vector machine therefore requires specifying two parameters: the kernel function and the magnitude of the penalty for violating the soft margin. The settings of these parameters depend on the specific data at hand.

Given an expression vector X for each example, the simplest kernel K(X,Y) that we can use to measure the similarity between vectors X and Y is the dot product in the input space K(X,Y) = X ° Y. For technical reasons (namely, that our optimization algorithm requires that the hyperplane pass through the origin), we typically add a constant to this kernel, obtaining a kernel such as by K(X,Y) = X ° Y +1. When this dot product kernel is used, the feature space is essentially the same as the m-dimensional input space, and the SVM will classify the examples with a separating hyperplane in this space. Squaring this kernel, i.e. defining K(X,Y) = (X ° Y +1)2, yields a quadratic separating surface in the input space. The corresponding separating hyperplane in the feature space includes features for all pairwise interactions Xi Xj between features, where 1 <= i, j <= m. Raising the kernel to higher powers yields polynomial separating surfaces of higher degrees in the input space. In general, the kernel of degree d is defined by K(X,Y) = (X ° Y +1)d. In the feature space of this kernel, for any example X there are features for all d-fold interactions between features.

Also commonly employed is the radial basis kernel (Scholkopf et al., IEEE Trans Sig Proc 45(11):2758-2765), which has a Gaussian form K(X,Y) = exp(-||X - Y||2 / 2 s2), where s is the width of the Gaussian. In the SVM, s is usually set equal to the median of the Euclidean distances from each positive example to the nearest negative example.

In many real-world problems, one of the two classes examined contains very few members relative to the other class. This imbalance in the number of positive and negative training examples, in combination with noise in the data, is likely to cause the SVM to make incorrect classifications. When the magnitude of the noise in the negative examples outweighs the total number of positive examples, the optimal hyperplane located by the SVM will be uninformative, classifying all members of the training set as negative examples. We combat this problem by modifying the matrix of kernel values computed during SVM optimization. Let X(1), ..., X(n) be the examples in the training set, and let K be the matrix defined by the kernel function K on this training set; i.e., Kij = K(X(i),X(j)). By adding to the diagonal of the kernel matrix a constant whose magnitude depends on the class of the training example, one can control the fraction of misclassified points in the two classes. This technique ensures that the positive points are not regarded as noisy labels. For positive examples, the diagonal element is modified by Kij := Kij + f*r*(n+/N), where n+ is the number of positive training examples, N is the total number of training examples, f is a user-specified scale factor, and r is the median diagonal kernel element. A similar formula is used for the negative examples, with n+ replaced by n-.

A more mathematically detailed discussion of these techniques is available at http://www.cse.ucsc.edu/research/compbio/genex.

Author:
Bruno Martins

Nested Class Summary
(package private) static class SVM.decisionFunction
          Inner class modeling the data for the SVM decision function, used for classifying points with respect to the hyperplane.
 
Field Summary
(package private) static java.lang.String[] kernelTypeTable
          An array with all the kernel type names.
(package private) static java.lang.String[] svmTypeTable
          Table with all the SVM problem formulation names.
 
Constructor Summary
SVM()
           
 
Method Summary
private static void multiclassProbability(int k, double[][] r, double[] p)
          Method 2 from the multiclass_prob paper by Wu, Lin, and Weng
private static double sigmoidPredict(double decision_value, double A, double B)
           
private static void sigmoidTrain(int l, double[] dec_values, double[] labels, double[] probAB)
          Platt's binary SVM Probablistic Output: an improvement from Lin et al.
private static void solveCSVC(SVMProblem prob, SVMParameter param, double[] alpha, Solver.SolutionInfo si, double Cp, double Cn)
          Solve the C-Support Vector classification problem.
private static void solveEpsilonSVR(SVMProblem prob, SVMParameter param, double[] alpha, Solver.SolutionInfo si)
          Solve the Epsilon-Support Vector Regression Problem
private static void solveNUSVC(SVMProblem prob, SVMParameter param, double[] alpha, Solver.SolutionInfo si)
          Solve the Nu-Support Vector Classification problem.
private static void solveNUSVR(SVMProblem prob, SVMParameter param, double[] alpha, Solver.SolutionInfo si)
          Solve the Nu-Support Vector Regression Problem
private static void solveOneClass(SVMProblem prob, SVMParameter param, double[] alpha, Solver.SolutionInfo si)
          Solve the SVM Distribution Estimation Problem.
private static void svmBinarySVCProbability(SVMProblem prob, SVMParameter param, double Cp, double Cn, double[] probAB)
          Cross-validation decision values for probability estimates.
static java.lang.String svmCheckParameter(SVMProblem prob, SVMParameter param)
          Check the parameters given to an SVM problem.
static int svmCheckProbabilityModel(SVMModel model)
           
static void svmCrossValidation(SVMProblem prob, SVMParameter param, int nr_fold, double[] target)
          Perform cross validation
static void svmGetLabels(SVMModel model, int[] label)
           
static double svmGetSVRProbability(SVMModel model)
           
static SVMModel svmLoadModel(java.lang.String model_file_name)
          Loads a trained SVM model from disk.
static double svmPredict(SVMModel model, SVMNode[] x)
           
static double svmPredictProbability(SVMModel model, SVMNode[] x, double[] prob_estimates)
           
static void svmPredictValues(SVMModel model, SVMNode[] x, double[] dec_values)
           
static void svmSaveModel(java.lang.String model_file_name, SVMModel model)
          Outputs a trained model to disk, for use in future classification tasks.
private static double svmSVRProbability(SVMProblem prob, SVMParameter param)
          Return parameter of a Laplace distribution
static SVMModel svmTrain(SVMProblem prob, SVMParameter param)
          Train the SVM.
(package private) static SVM.decisionFunction svmTrainOne(SVMProblem prob, SVMParameter param, double Cp, double Cn)
          Computation for one step of the training procedure.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

svmTypeTable

static final java.lang.String[] svmTypeTable
Table with all the SVM problem formulation names.


kernelTypeTable

static final java.lang.String[] kernelTypeTable
An array with all the kernel type names.

Constructor Detail

SVM

public SVM()
Method Detail

solveCSVC

private static void solveCSVC(SVMProblem prob,
                              SVMParameter param,
                              double[] alpha,
                              Solver.SolutionInfo si,
                              double Cp,
                              double Cn)
Solve the C-Support Vector classification problem.

Parameters:
prob - The SVM Problem.
param - The parameters.
alpha - The SVM alpha values (will be filled by learning).
si - Information about the solution (used for returning multiple values).
Cp -
Cn -

solveNUSVC

private static void solveNUSVC(SVMProblem prob,
                               SVMParameter param,
                               double[] alpha,
                               Solver.SolutionInfo si)
Solve the Nu-Support Vector Classification problem.

Parameters:
prob - The SVM Problem.
param - The parameters.
alpha - The SVM alpha values (will be filled by learning).
si - Information about the solution (used for returning multiple values).

solveOneClass

private static void solveOneClass(SVMProblem prob,
                                  SVMParameter param,
                                  double[] alpha,
                                  Solver.SolutionInfo si)
Solve the SVM Distribution Estimation Problem.

Parameters:
prob - The SVM Problem.
param - The parameters.
alpha - The SVM alpha values (will be filled by learning).
si - Information about the solution (used for returning multiple values).

solveEpsilonSVR

private static void solveEpsilonSVR(SVMProblem prob,
                                    SVMParameter param,
                                    double[] alpha,
                                    Solver.SolutionInfo si)
Solve the Epsilon-Support Vector Regression Problem

Parameters:
prob - The SVM Problem.
param - The parameters.
alpha - The SVM alpha values (will be filled by learning).
si - Information about the solution (used for returning multiple values).

solveNUSVR

private static void solveNUSVR(SVMProblem prob,
                               SVMParameter param,
                               double[] alpha,
                               Solver.SolutionInfo si)
Solve the Nu-Support Vector Regression Problem

Parameters:
prob - The SVM Problem.
param - The parameters.
alpha - The SVM alpha values (will be filled by learning).
si - Information about the solution (used for returning multiple values).

svmTrainOne

static SVM.decisionFunction svmTrainOne(SVMProblem prob,
                                        SVMParameter param,
                                        double Cp,
                                        double Cn)
Computation for one step of the training procedure.

Parameters:
prob - The SVM Problem
param - The Parameters
Cp -
Cn -
Returns:
The decision function used for classifying points after this step.

sigmoidTrain

private static void sigmoidTrain(int l,
                                 double[] dec_values,
                                 double[] labels,
                                 double[] probAB)
Platt's binary SVM Probablistic Output: an improvement from Lin et al.

Parameters:
l -
dec_values -
labels -
probAB -

sigmoidPredict

private static double sigmoidPredict(double decision_value,
                                     double A,
                                     double B)
Parameters:
decision_value -
A -
B -
Returns:

multiclassProbability

private static void multiclassProbability(int k,
                                          double[][] r,
                                          double[] p)
Method 2 from the multiclass_prob paper by Wu, Lin, and Weng

Parameters:
k -
r -
p -

svmBinarySVCProbability

private static void svmBinarySVCProbability(SVMProblem prob,
                                            SVMParameter param,
                                            double Cp,
                                            double Cn,
                                            double[] probAB)
Cross-validation decision values for probability estimates.

Parameters:
prob -
param -
Cp -
Cn -
probAB -

svmSVRProbability

private static double svmSVRProbability(SVMProblem prob,
                                        SVMParameter param)
Return parameter of a Laplace distribution

Parameters:
prob -
param -
Returns:

svmTrain

public static SVMModel svmTrain(SVMProblem prob,
                                SVMParameter param)
Train the SVM.

Parameters:
prob - The SVM Problem.
param - The parameters.
Returns:
An SVM model for the given problem.

svmCrossValidation

public static void svmCrossValidation(SVMProblem prob,
                                      SVMParameter param,
                                      int nr_fold,
                                      double[] target)
Perform cross validation

Parameters:
prob - The SVM problem.
param - The parameters.
nr_fold - Number of folds for the cross validation.
target -

svmGetLabels

public static void svmGetLabels(SVMModel model,
                                int[] label)
Parameters:
model -
label -

svmGetSVRProbability

public static double svmGetSVRProbability(SVMModel model)
Parameters:
model -
Returns:

svmPredictValues

public static void svmPredictValues(SVMModel model,
                                    SVMNode[] x,
                                    double[] dec_values)
Parameters:
model -
x -
dec_values -

svmPredict

public static double svmPredict(SVMModel model,
                                SVMNode[] x)
Parameters:
model -
x -
Returns:

svmPredictProbability

public static double svmPredictProbability(SVMModel model,
                                           SVMNode[] x,
                                           double[] prob_estimates)
Parameters:
model -
x -
prob_estimates -
Returns:

svmSaveModel

public static void svmSaveModel(java.lang.String model_file_name,
                                SVMModel model)
                         throws java.io.IOException
Outputs a trained model to disk, for use in future classification tasks.

Parameters:
model_file_name - The name of the file storing the model.
model - The SVM model.
Throws:
java.io.IOException - A problem occured while storing the model.

svmLoadModel

public static SVMModel svmLoadModel(java.lang.String model_file_name)
                             throws java.io.IOException
Loads a trained SVM model from disk.

Parameters:
model_file_name - The path name for the file containing the model.
Returns:
An SVM model.
Throws:
java.io.IOException - A problem occured while reading the model file.

svmCheckParameter

public static java.lang.String svmCheckParameter(SVMProblem prob,
                                                 SVMParameter param)
Check the parameters given to an SVM problem.

Parameters:
prob - An SVM Problem.
param - The parameters.
Returns:
A string with an informative message about the found problems, or an empty String.

svmCheckProbabilityModel

public static int svmCheckProbabilityModel(SVMModel model)
Parameters:
model - An SVM model.
Returns: