SVM Scaling Issues with One-Class Novelty Detection

My embedded implementation of a novelty detection algorithm started returning 0.0 where I was expecting a negative number, for some random input data I gave it.

Digging into the maths, the kernel that it uses – a Radial Basis Function (RBF) returns

exp(-gamma * result)

Where gamma is some constant and result is almost the input datapoint squared, a fairly big number. So, if it’s returning 0.0, even in double floating point, then the number must be really small, which makes sense since

    \[ e^{-big number} = \frac{1}{e^{big number}} \]

Then I remembered the scaling algorithm that all data is put through before it goes into the SVM model. This scaling is recommended in many application guides to SVMs (most notably here, the most used C SVM implementation, scaling is discussed in section 2.2).

What this scaling algorithm does is take every training datapoint for a feature, calculate the maximum and the minimum, and scale each of them such that the maximum datapoint becomes 1, and the minimum datapoint becomes -1. This happens for all features, such that the new training dataset contains only values between -1 and 1. This is because if one feature were to have a range from -1000 to 1000 say, and another feature to have a range from 0.1 to 0.25. Then the feature with the much larger range would have a much larger bearing on the classification outcome.

Anyway – the important part is, when SVMs are used for novelty detection they are trained with only “in” data and are expected to classify a new datapoint as “in” or “out”. But our scaling algorithm has only seen “in” data. It has been produced very short-sightedly to only scale datapoints within a range previously seen.

Obviously the “out” datapoints may have a completely different range to our “in” datapoints. Such that any “out” datapoints that fall outside that range may be scaled to be much bigger or smaller than -1 and 1.

Which is where we’re getting out big numbers that then get turned into very small numbers by the RBF kernel.

Understanding the problem is one thing, solving it is another. I’ve put a question out on machine learning stack exchange, we’ll see what comes back!

Leave a Reply

Your email address will not be published. Required fields are marked *