Useful Materials

Distinctive Image Features from ScaleInvariant Keypoints^{[1]} by David G. Lowe.

SIFT(ScaleInvariant Feature Transform)^{[2]} on Towards Data Science.

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor^{[3]}.
Notes

Uses DoG (Difference of Gaussian) to approximate Scalenormalized LoG (Laplacian of Gaussian)^{[4]}.
$$ G(x,y,\sigma) = \frac{1}{2 \pi \sigma^2} \exp^{ \frac{x^2 + y^2}{2 \sigma^2}} \\ D(x,y,\sigma) = (G(x,y,k \sigma)  G(x,y,\sigma)) * I(x,y) \\ G(x,y,k \sigma)  G(x,y,\sigma) \approx (k1) \sigma^2 \nabla^2 G $$
where \( G(x,y,\sigma) \) is the two dimensions Gaussian function, and \( I(x,y) \) is the input image.

[need more consideration] After each octave, the Gaussian image is downsampled by a factor of 2, by resampling the Gaussian image that has twice the initial value of \( \sigma \) by taking every second pixel in each row and column. And we start on the new octave with \( k^2 \sigma \).
Since the image size is reduced to 1/4, the sigma for the next octave becomes \( 2 k^4 \sigma \), which is equal to \( k^2 \sigma \).
To understand it, frist consider this question: If the image size is reduced to 1\4, but the kernel size of the gaussian function remains the same, in order to produce the same effect as before, what should the new \( \sigma \) be?
Given \( D_1 \) as an area in the original image, \( D_2 \) as the corresponding area in the new image, \( r_1 \) and \( r_2 \) are the radius of \( D_1 \) and \( D_2 \) respectively, then we have:
$$ Area\_of(D_2) = \frac{1}{4} Area\_of(D_1) \\ r_2 = \frac{1}{2} r_1 $$
Let the new \( \sigma \) be \( a \) times the original one, the question is now to solve the a for:
$$ \iint_{D_1} G(x,y,\sigma) ,dx,dy = \iint_{D_2} G(x,y,a\sigma) ,dx,dy $$
With the help of Gaussian Integral^{[5]}, we can solve \( a \):
$$ \begin{align*} \Rightarrow & 2 \pi \frac{1}{2 \pi \sigma^2} \int^{r_1}_{0} \exp^{ \frac{r^2}{2 \sigma^2}} r ,dr = 2 \pi \frac{1}{2 \pi a^2 \sigma^2} \int^{r_2}_{0} \exp^{ \frac{r^2}{2 a^2 \sigma^2}} r ,dr \\ \Rightarrow &  \sigma^2 (\exp^{ \frac{r_1^2}{2 \sigma^2}}  \exp^0) = \frac{1}{a^2} ( a^2 \sigma^2) (\exp^{ \frac{r_2^2}{2 a^2 \sigma^2}}  \exp^0) \\ \Rightarrow & a^2 = \frac{r_2^2}{r_1^2} \\ \Rightarrow & a = \frac{1}{2} \end{align*} $$
Puzzles
 How the 3D interpolation (using Taylor expansion) works.
TODO
 Continue at section 4.1 Eliminating edge responses.
References
David G. Lowe. 2004. Distinctive Image Features from ScaleInvariant Keypoints. Int. J. Comput. Vision 60, 2 (November 2004), 91110. DOI: https://doi.org/10.1023/B:VISI.0000029664.99615.94
SIFT(ScaleInvariant Feature Transform)  Towards Data Science
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor