Friday, March 27, 2009

AdaBoost

There are a couple of algorithms that float around that I always feel I need to know more about. During a number of data mining conferences the top 10 algorithms in data mining were collated.

One that seems to get a lot of attention in computer vision and robotics is Adaptive Boosting (aka AdaBoost). You can use this algorithm to create a strong classifier from a set of weak classifiers. (One way to think of this is that weak classifiers give you a very approximate answer to a classification problem, strong ones give a much much better approximation)

For example in the figures below, you can see how the algorithm builds up a strong classifier from combining multiple weak classifiers.
(from Jan Sochman's AdaBoost talk)











AdaBoost combines these classifiers through a set of weights that it adaptively adjusts.

I can hardly claim to be an expert on the subject, but since I find the mathsy description of it on Wikipedia confusing, here is my summary of it in CS-algorithmish talk:

Start with a training data set of features, labeled with the correct answer.
ie: data[i], correct_answer[i]
Run each weak classifier with the data set and store its results.
ie:results[x,i] = classifierX(data[i])
Initialize each of the weightings with 1 / (number of entries in the data set).
At this step we are starting of with each classifier having equal weighting.
ie: D[i] = 1/m

For N number of iterations:
For each classifier:
For each feature, i:
if the output of the classifier does not equal the correct answer, update the error.
error+=D[i]
(ie: sum of the weights of the misclassified samples)
if the error is less than the minimum error, mark this as the best classifier
End For
Now update the importance of the classifier:
alpha[best classifier] = 0.5 * log(1 - minimum error)/minimum error
And update all the classifier weightings:
For each feature,i
D[i] = *= exp(-alpha[best] * answer[i] * result(best,i)
z+=D[i]
End for
For each feature, update D[i] = D[i]/z to renormalize.

And.. were done!

In my experiments it seems not to work too well,.. it definitely improves on your weak classifiers a bit, but they have to be pretty close to spot on or able to solve the problem already. I think I'd still prefer using a neural net for classification..