Pattern Recognition



Information Content where, I : the information content of X. An X with greater I value contains more information. P : the probability mass function. : b is the base of the logarithm used. Common values of b are 2 (bits), Euler's number e (nats), and 10 (bans). Entropy (Information Theory)[1] where, H : the entropy H. Named after Boltzmann's H-theorem (but the definition is proposed by Shannon). H indicates the uncertainty of X. P : probability mass function. I : the information content of X. E : the expected value operator. The entropy can explicitly be written as: ID3[2] Use ID3 to build a decision tree: Calculate the entropy of the samples under the current node. Find a feature F that can maximize the information gain. The information gain is calculatd by: where E is the entropy of