# NLP with Deep Learning

2021-09-09

Machine Learning

332

Transformer
Attention is all you need - Arxiv
Feed Forward: Two fully connected layers (Linear) with a ReLU activation in between.
Multi-head Attention:
Attention
Attention - Qiita
Self-Attention
GPT (Generative Pre-Training)
GPT - paper
BERT (Bidirectional Encoder Representations from

# Decision Tree

2019-03-16

Machine-Learning

1037

Information Content
where,
I : the information content of X. An X with greater I value contains more information.
P : the probability mass function.
: b is the base of the logarithm used. Common values of b are 2 (bits),

# Convolutional Neural Network

2018-08-19

Machine-Learning

333

This article is my reflection on my previous work FaceLock, a project to recognize user's face and lock the computer if the user doesn't present in a certain time. CNN is used to recognize different faces. I watch the Coursera course Convolutional Neural Networks by

# Mathematical Basis - Squashing Function

2018-08-11

Machine-Learning

444

This article is about some squashing functions of deep learning, including Softmax Function, Sigmoid Function, and Hyperbolic Functions. All of these three functions are used to squash value to a certain range.
Softmax Function
Softmax Function: A generalization of the logistic function that "squashes" a

# Recurrent Neural Network

2018-07-30

Machine-Learning

388

This article is my learning note of the coursera course Sequence Models by Andrew Yan-Tak Ng.
There are two typical RNN units of the hidden layers of the RNN according to Andrew Ng. One is GRN (Gated Recurrent Unit), the other is LSTM (Long Short

# AdaBoost

2018-03-30

Machine-Learning

1004

Python 实现: AdaBoost - Donny-Hikari - Github
Introduction
AdaBoost 是 Adaptive Boosting 的简称。 Boosting 是一种 Ensemble Learning 方法。 其他的 Ensemble Learning 方法还有 Bagging, Stacking 等。 Bagging, Boosting, Stacking 的区别如下：
Bagging:
Equal weight voting. Trains

# Classification And Overfitting

2017-11-20

Machine-Learning

658

This is a learning note of Logistic Regression of Machine Learning by Andrew Ng on Coursera.
Hypothesis Representation
Uses the "Sigmoid Function," also called the "Logistic Function":
Which turn linear regression into classification.
Sigmoid function looks like this:
give us the probability that