Using LibSVM in Java
For the past couple of months, I’ve been trying to get my feet wet with machine learning and started work on implementing a Behavioral Authentication mechanism for Android devices using Support Vector Machines (more on that later in another blog post). SVM is a relatively popular classifier which seemed appropriate for a beginner like me, and everything did go well until I had to implement the R prototype in Java.
I went with OneClass SVM for modelling purposes, and the obvious choice was the libsvm
library by ChihJen Lin, but there’s virtually no documentation for the Java
version either on their homepage or Github, simply referencing
their C documentation for Java implementations. So after digging through all of their Java
examples, I had a basic version of my port ready, but it gave wildly different results
compared to the R version.
Turns out you need to scale and normalize all data values between 0
and 1
, at least for
OCSVM
. These are the double
values using which you construct the svm_node
2D Array
x
in the svm_problem
object. Without doing this, the classifier just goes batshit
crazy and just spits out random values. I imagine the R version of the library does that
automatically the for given data. Other than that, you also don’t need to have an extra
svm_node
object with an index of 1
at the end of the x[]
arrays to denote the end
of the vector (like the C version).
For running the OneClass classifier, everything else was pretty much same as the C code or the available Java examples, but I would usually use some sort of helper function to build the node arrays. For example, for building 2D points on a plane, I used something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

Combine many of these together and you get the 2D array svm_node[][]
we need for the
SVM problem. Building the model is pretty straightforward (use your own gamma & nu values
depending on your data):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 

For classificiation, there’s the svm.svm_predict(model, nodes)
function that returns
either a 1
or +1
for oneclass, but there’s another method available:
svm.svm_predict_values(m, n, v)
that can give you a prediction confidence score used
to return the positive or negative one. For RBF
, this score means the distance from
the center of the elliptical hyperplane drawn during modelling. Getting this “score” is
a bit different since this function itself also returns either a 1
or +1
. You have
to pass a 2element array as the third argument to this function. After calling it, the
first value of the array will contain the score:
1 2 3 4 5 6 

I really hope someone writes a better version/wrapper of LibSVM in Java, or improves the documentation so beginners like me can avoid wasting hours over implementation issues.