Mathematical Basis - Squashing Function

# Mathematical Basis - Squashing Function

August 11, 2018
August 26, 2018
Donny

Tags in blue are handcrafted tags; Tags in green are generated using AutoTag.

## Softmax Function

Softmax Function: A generalization of the logistic function that "squashes" a K-dimensional vector z of arbitrary real values to a K-dimensional vector $$\sigma ( \mathbf z )$$ of real values, where each entry is in the range (0, 1], and all the entries add up to 1.

$${\displaystyle \sigma : \mathbb{R}^{K}\to \left{ \sigma \in \mathbb{R}^{K} | \sigma_{i} > 0, \sum_{i=1}^{K} \sigma_{i} = 1 \right}}$$

$$\sigma ( \mathbf{z} )_{j}={ \frac{e^{z_{j}}}{ \sum_{k=1}^{K} e^{z_{k}} } } \quad \text{for j=1, ..., K.}$$

In probability theory, the output of the softmax function can be used to represent a categorical distribution - that is, a probability distribution over K different possible outcomes.

The softmax function is the gradient of the LogSumExp function.

## LogSumExp Function

LogSumExp Function: The LogSumExp(LSE) function is a smooth approximation to the maximum function.

$$LSE(x_1, ..., x_n) = log(\sum_{i=1}^{n} e^{x_i})$$

($$log$$ stands for the natural logarithm function, i.e. the logarithm to the base e.)

When directly encountered, LSE can be well-approximated by $$max { x_1, ..., x_n }$$ :

$$max { x_1, ..., x_n } \leq LSE(x_1, ..., x_n) \leq max { x_1, ..., x_n } + log(n)$$

## Sigmoid Function

Sigmoid Function: A mathematical function with a "S"-shaped curve.

One frequent used sigmoid function in ML is the logistic function.

Logistic Function:

$$f(x) = \frac{1}{1+e^{-x}}$$

$$\tikz \node [scale=1.1] { \begin{tikzpicture}[] \begin{axis}[ axis line style=gray, ymin=-1, ymax=2, axis x line=center, axis y line=center, xlabel=x, ylabel=y ] \addplot[blue]{1/(1+exp(-x))}; \addplot[blue] coordinates{(3,1.2)} node{f(x)=\frac{1}{1+e^{-x}}}; \end{axis} \end{tikzpicture} };$$

To understand how it work, first we differentiate it:

$$f\prime(x) = \frac{1}{2+e^{x}+e^{-x}}$$

$$\tikz \node [scale=1.1] { \begin{tikzpicture}[] \begin{axis}[ axis line style=gray, ymin=-1, ymax=1, axis x line=center, axis y line=center, xlabel=x, ylabel=y ] \addplot[blue]{1/(2+exp(x)+exp(-x))}; \addplot[blue] coordinates{(3,0.3)} node{f(x)=\frac{1}{2+e^{x}+e^{-x}}}; \end{axis} \end{tikzpicture} };$$

Since function $$g(x) = e^{x}+e^{-x}$$ is a (hyperbolic ?) curve which looks like a quadratic function, it's clear that $$f\prime(x)$$ will go up to 0.25 from 0, from negative infinity to 0, and than go down to 0 again, from 0 to positive infinity.

So from the derivative, we can see that logistic function will be in "S"-shape, with its value very close to 0 when x getting closer and closer to negative infinity, and with its value very close to 1 when x getting near positive infinity. The turning point of tendency is 0, where the second derivative is 0 and where the value of the logistic function is 0.5.

## Hyperbolic Function

Inspired by Euler's formula, $$e^{i\theta} = cos(\theta) + i sin(\theta)$$, hyperbolic functions extend the notion of the parametric equations for a unit circle to the parametric equations for a hyperbola. ( From Hyperbolic Trigonometric Function | Brilliant Math & Science Wiki )

Notice: Hyperbolic Tangent is also a sigmoid function according to wikipedia Sigmoid Function.

hyperbolic connection with trigonometric
sine $$sinh(x) = \frac{e^x - e^{-x}}{2}$$ $$sinh(x)=-i sin(ix)$$
cosine $$cosh(x) = \frac{e^x + e^{-x}}{2}$$ $$cosh(x) = cos(ix)$$
tangent $$tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} = \frac{1-e^{-2x}}{1+e^{-2x}}$$ $$tanh(x) = -i tan(ix)$$
cotangent ($$= \frac{1}{tan(x)}$$) $$coth(x) = \frac{e^x + e^{-x}}{e^x - e^{-x}} , \quad {x \ne 0}$$ $$coth(x) = i cot(ix)$$
secant ($$= \frac{1}{cos(x)}$$) $$sech(x) = \frac{2}{e^x + e^{-x}}$$ $$sech(x) = sec(ix)$$
cosecant ($$= \frac{1}{sin(x)}$$) $$csch(x) = \frac{2}{e^x - e^{-x}} , \quad {x \ne 0}$$ $$csc(x) = i csc(ix)$$

Graph of these functions :

$$\tikz \node [scale=1.1] { \begin{tikzpicture}[] \begin{axis}[ samples=120, axis line style=gray, ymin=-5, ymax=5, axis equal, axis x line=center, axis y line=center, xlabel=x, ylabel=y, ] \addplot[green]{(exp(x)-exp(-x))/2}; \addlegendentry{sinh} \addplot[red]{(exp(x)+exp(-x))/2}; \addlegendentry{cosh} \addplot[blue]{(exp(x)-exp(-x))/(exp(x)+exp(-x))}; \addlegendentry{tanh} \end{axis} \end{tikzpicture} };$$

$$\tikz \node [scale=1.1] { \begin{tikzpicture}[] \begin{axis}[ samples=120, axis line style=gray, ymin=-5, ymax=5, axis equal, axis x line=center, axis y line=center, xlabel=x, ylabel=y, ] \addplot[green, restrict expr to domain={(x<-0.1)+(x>0.1)}{0.1:+inf}]{(exp(x)+exp(-x))/(exp(x)-exp(-x))}; \addlegendentry{coth} \addplot[red]{2/(exp(x)+exp(-x))}; \addlegendentry{sech} \addplot[blue, restrict expr to domain={(x<-0.1)+(x>0.1)}{0.1:+inf}]{2/(exp(x)-exp(-x))}; \addlegendentry{csch} \end{axis} \end{tikzpicture} };$$