Decision Tree
2019-03-16
Machine-Learning
1037
Information Content
where,
I : the information content of X. An X with greater I value contains more information.
P : the probability mass function.
: b is the base of the logarithm used. Common values of b are 2 (bits), Euler's number e (nats), and 10 (bans).
Entropy (Information Theory)[1]
where,
H : the entropy H. Named after Boltzmann's H-theorem (but the definition is proposed by Shannon). H indicates the uncertainty of X.
P : probability mass function.
I : the information content of X.
E : the expected value operator.
The entropy can explicitly be written as:
ID3[2]
Use ID3 to build a decision tree:
Calculate the entropy of the samples under the current node.
Find a feature F that can maximize the information gain. The information gain is calculatd by:
where E is the entropy of