Donny
March 30, 2018
1004
-

Tags in blue are handcrafted tags; Tags in green are generated using AutoTag.

Python 实现: AdaBoost - Donny-Hikari - Github

Introduction

AdaBoost 是 Adaptive Boosting 的简称。 Boosting 是一种 Ensemble Learning 方法。 其他的 Ensemble Learning 方法还有 Bagging, Stacking 等。 Bagging, Boosting, Stacking 的区别如下：

1. Bagging:
Equal weight voting. Trains each model with a random drawn subset of training set.

2. Boosting:
Trains each new model instance to emphasize the training instances that previous models mis-classified. Has better accuracy comparing to bagging, but also tends to overfit.

3. Stacking:
Trains a learning algorithm to combine the predictions of several other learning algorithms.

The Formulas

Given a N*M matrix X, and a N vector y, where N is the count of samples, and M is the features of samples. AdaBoost trains T weak classifiers with the following steps:

1. Initialize. / 初始化 \begin{align*} & W_{0,j} = 1 / N \\ & i = 0 \end{align*}

2. Train i-th weak classifier with training set {X, y} and distribution W_i. / 用样本权重和样本训练第i个弱分类器h_i：

3. Get the predict result $$h_i$$ on the weak classifier with input X. / 获得样本加权错误和epsilon \begin{align*} & h_i: X \rightarrow {-1, 1} \\ & \epsilon = Pr_{j ~ W_i} [ h_i(X_j) \neq y_j ] \\ & = \sum_{j=1}^{N} ( W_i * ( h_i(X) \neq y ) ) \end{align*}

4. Update alpha and normalized weights. / 更新alpha和归一化样本权重 $$W_i$$ \begin{align*} & \alpha_{i} = \frac{1}{2} \ln{ \frac{1 - \epsilon}{\epsilon} } \\ & W_{i+1,j} = \frac{W_{i}}{Z_{i+1}} e^{ - \alpha y_j h_{i,j} } \\ \text{where} \quad & Z_{i+1} = \sum_{j=1}^{N} W_{i+1,j} \end{align*}

Z is a normalization factor.

1. Repeat steps 2 ~ 4 until i reaches T. / 迭代2~3获得T个弱分类器

2. Output the final hypothesis: / 最终分类结果为: $$H(X) = sign( \sum_{i=1}^{T} ( \alpha_{i} * h_i(X) ) )$$

Analyzation of the Formulas

1. 权重的调整。

权重的调整实际上是 $$W_{i+1,j} = \frac{ W_{i,j} }{ Z_{i+1} } * ( \frac{1 - \epsilon}{\epsilon} )^{-(y_j == h_{i,j})}$$. $$h_{i,j}$$ 为第i个弱分类器对第j个样本的分类结果。

暂且假设分类错误率e小于0.5 (大于0.5参见[2]).

当且仅当分类正确时，权重调整为原来的 $$\frac{\epsilon}{1 - \epsilon}$$. 则该分数小于1，故权重变小。

当且仅当分类错误时，权重调整为原来的 $$\frac{1 - \epsilon}{\epsilon}$$. 则该分数大于1，故权重变大。

2. 若单个分类器错误率高于50%，会导致正确分类的样本权重增加。

此时，实际上将发生”逆转“。因为alpha为负值，在最后投票时，该分类器的分类结果将相反。因此正确分类的成为错误分类，错误分类的成为正确分类。故而权重的增加仍加在“错误的“分类上，只是该错误是对全局而言。

能够发生“逆转”的关键是 ln 函数。

3. 剩下的问题是alpha系数中为何要加 1/2.

思考无果，跟同学讨论了一下。据说是为了方便某些情况下的处理。

从原理入手。查了下维基，Rojas 2009 年的推导。颇为清晰。见[Derivation].

Derivation

E 对 $$\alpha_m$$ 求导 $$\frac{d E}{d \alpha_m} = \frac{ d ( \sum_{ h_m(X_j) = y_j } ( W_{m,j} * e^{ -\alpha_m } ) + \sum_{ h_m(X_j) \neq y_j } ( W_{m,j} * e^{ \alpha_m } ) ) }{d \alpha_m}$$

Test Result

Using a demo from sklearn AdaBoost, I got the following result.

Weak classifiers: 200; Iteration steps in each weak classifier: 200:

Compares with the AdaBoostClassifier from sklearn with 200 estimators (weak classifiers):

More comparison for my AdaBoostClassifier with different parameters:

estimators iteration steps time accuracy
30 30 0.0621 0.8540
30 60 0.1095 0.8760
30 200 0.3725 0.8620
30 400 0.7168 0.8720
60 30 0.1291 0.8620
60 60 0.2328 0.8600
60 200 0.7886 0.8780
60 400 1.4679 0.8840
200 30 0.3942 0.8600
200 60 0.7560 0.8700
200 200 2.4925 0.8900
200 400 4.7178 0.9020
400 30 0.8758 0.8640
400 60 1.6578 0.8720
400 200 5.0294 0.9040
400 400 10.0294 0.9260