Transformer Attention is all you need - Arxiv Feed Forward: Two fully connected layers (Linear) with a ReLU activation in between. Multi-head Attention: Attention Attention - Qiita Self-Attention GPT (Generative Pre-Training) GPT - paper BERT (Bidirectional Encoder Representations from Transformers) BERT - Arxiv BERT Explained Different from the Transformer (GPT) which is trained using only the left context, BERT uses bidirectional encoder which makes use of both the left and the right context. [MASK] is used to mask some of the words so that the model will not see the word itself indirectly. Pre-training of BERT makes use of two strategies: MLM (Masked Language Model) and NSP (Next Sentence Prediction). The model is trained with both the strategies together. As shown below, the input embeddings of BERT consists of the token embeddings, the segment embeddings, and the position embeddings. Note that a segment may consists of multiple sentences. In MLM task,



Tic-Tac-Toe Online Server Base on the Tic-Tac-Toe Game of CS188, Berkeley, I develop an online version of Tic-Tac-Toe. Now your agent can play with my agent online! I think it is a good way to check whether our agents are optimal or not. My agent can beat random agents most of the time even if my agent is the second player. Online Server Website: Tic-Tac-Toe Online Download the attached client file from the moodle form and place it in the same directory of your ( depends on , so is also needed). Run this command: $ python3 -u demo -n 3 And Enjoy! Notice: You need to specify a username with "-u USERNAME". Don't use "demo" as your username cause it is forbidden. Usage: [options] Options Variable Description -u USERNAME Username, must not be empty nor
The Question Fish eating fruit on jisuanke Given an undirected acyclic graph G, all possible path P in the graph, calculate: The first taste In the contest, a handsome foriegn teammate conviced me that this problem can be solve using LCA. I tried. And it did work, with the help of dijkstra. My solution is to, first of all, run dijkstra, and get the distance between root node and every other nodes. Then, calculate the LCA for every two nodes. The desired result is: It worked, but we got TLE for looping though all the nodes, which is . The second trial After the contest, I was told that this is a DP problem. You calculate the times an edge is accessed, times it with the weight, sum them up by the modulus of 3, you got the result. This one, however, also got TLE. Oh, FISH! The final solution The reason why the second solution still can
Overview Regions with CNN features: Efficient Graph Based Image Segmentation use disjoint set to speed up merge operation Selective Search HOG (Histogram of Oriented Gradient) Multiple criterions (color, texture, size, shape) to merge regions AlexNet/VGG16 R-CNN Notice that many descriptions are replicated from the orignal sources directly. Some Fundermental Conceptions Batch Size Stochastic Gradient Descent. Batch Size = 1 Batch Gradient Descent. Batch Size = Size of Training Set Mini-Batch Gradient Descent. 1 < Batch Size < Size of Training Set Regularization A regression model that uses L1 Regularization technique is called Lasso Regression and model which uses L2 is called Ridge Regression. Ridge Regularization Ridge regression adds "squared magnitude" of coefficient as penalty term to the loss function. The first sum is an example of loss function. Lasso Regularization Lasso Regression (Least Absolute Shrinkage and Selection Operator) adds "absolute value of magnitude" of coefficient as penalty term to the loss function.
Useful Materials Distinctive Image Features from Scale-Invariant Keypoints[1] by David G. Lowe. SIFT(Scale-Invariant Feature Transform)[2] on Towards Data Science. The SIFT (Scale Invariant Feature Transform) Detector and Descriptor[3]. Notes Uses DoG (Difference of Gaussian) to approximate Scale-normalized LoG (Laplacian of Gaussian)[4]. where is the two dimensions Gaussian function, and is the input image. [need more consideration] After each octave, the Gaussian image is down-sampled by a factor of 2, by resampling the Gaussian image that has twice the initial value of by taking every second pixel in each row and column. And we start on the new octave with . Since the image size is reduced to 1/4, the sigma for the next octave becomes , which is equal to . To understand it, frist consider this question: If the image size is reduced to 1\4, but the kernel size of
Types of Noise Additive noise Additive noise is independent from image signal. The image g with nosie can be considered as the sum of ideal image f and noise n.[1] Multiplicative noise Multifplicative noise is often dependent on image signal. The relation of image and noise is[1]: Gaussian noise Gaussian noise, named after Carl Friedrich Gauss, is statistical noise having a probability density function (PDF) equal to that of the normal distribution, aka. the Gaussian distribution. i.e. the values that the noise can take on are Gaussian-distributed. The PDF of a Gaussian random variable is given by[2]: Salt-and-pepper noise Fat-tail distributed or "impulsive" noise is sometimes called salt-and-pepper nosie or spike noise. An image containing salt-and-pepper noise will have dark pixels in bright regions and bright pixels in dark regions.[2] The PDF of (Bipolar) Impulse noise is given by: if b > a, gray-level

Greek Alphabet

Greek Alphabet Letter (Capital Case) Letter (Lower Case) Name English Equivalent alpha a beta b gamma g delta d epsilon e zeta z eta h theta th iota i kappa k lambda l mu m nu n xi x omicron o pi p rho r sigma s tau t upsilon u phi ph chi ch psi ps omega o References [1]. Greek Alphabet - Wikipedia [2]. Greek alphabet letters & symbols
Information Content where, I : the information content of X. An X with greater I value contains more information. P : the probability mass function. : b is the base of the logarithm used. Common values of b are 2 (bits), Euler's number e (nats), and 10 (bans). Entropy (Information Theory)[1] where, H : the entropy H. Named after Boltzmann's H-theorem (but the definition is proposed by Shannon). H indicates the uncertainty of X. P : probability mass function. I : the information content of X. E : the expected value operator. The entropy can explicitly be written as: ID3[2] Use ID3 to build a decision tree: Calculate the entropy of the samples under the current node. Find a feature F that can maximize the information gain. The information gain is calculatd by: where E is the entropy of
AutoTag AutoTag is a program that generate tags for documents automatically. The main process includes: Participle (N-gram + lookup in dictionary) Generate bag-of-words for each document. Calculate term frequency and inverse document frequency. Pick top x words with greater tf-idf values as tags. N-gram N-gram generate a sequence of n words in every position of a sentence.[1] sentences = 'Lucy like to listen to music. Luna like music too.' items = ngram(sentences, 2) print(items) # output: [ 'Lucy like', 'like to', 'to listen', 'listen to', 'to music', 'Luna like', 'like music', 'music too', ] Bag of words The bag-of-words model is a simplifying representation in NLP and IR.[1] N-gram Count the times that each word appears


w3m: WWW wo Miru (c) Copyright Akinori ITO w3m is a pager with WWW capability. It IS a pager, but it can be used as a text-mode WWW browser. Keyboard Shortcuts Shortcut Action Level H Help Brower q Quit w3m Brower C-h History Brower T New tab Tabs C-j Open link on current tab Tabs C-t Open link on new tab Tabs C-q Close tab Tabs U Go to url Page R Reload Page B Back Page Configuration File Location Keymap file ~/.w3m/keymap

Remember Minori

十年时间转瞬即逝,许多曾经深爱的事物都一一离去了,不免让人生出逝者如斯的感慨。 说来有些讽刺,作为一个计算机的学生,理应走在前沿,我却总怀念老式的交互和十年前的网络。 人生得意须尽欢,莫使金樽空对月。 如同那句台词所说,我们总要不断地告别。和老去的事物告别,和过去告别。至少在最后道一声感谢,笑着告别。

More Blogs

Friends' Blogs EquimA Kernel Implementation 長門有希An Outstanding ACMer Bug ZhangA Great Web Master Mr. ZhuA Dreamer Favorite Blogs CHARON(カロン)同人サークル「CHARON」の公式サイト 海底囚人海底囚人の趣味の個人サイトです
Permission Control for NTFS We often encounter the problem that to mount NTFS under Linux means no permission control. But that is not true. According to JanC's Answer on AskUbuntu: Contrary to what most people believe, NTFS is a POSIX-compatible filesystem, and it is possible to use permission on NTFS. The First Trial First let's just open /etc/fstab and see how partitions are mounted. $ sudo nano /etc/fstab In my situation, the NTFS partitions are mounted as followed: /dev/sda1 /mnt/NTFS1 auto nosuid,nodev,nofail,x-gvfs-show 0 0 /dev/sda2 /mnt/NTFS2 auto nosuid,nodev,nofail,x-gvfs-show 0 0 The nosuid keyword prevents setting uid on filesystem. So the First step is to remove this keyword. However, once this keyword is removed, an uid and a gid must be given to setup the permission control. By default, the current uid and the current gid will be used. We can also specifiy



List # list filter table sudo iptables -L # list nat table sudo iptables -L -t nat Redirect # Redirect locally sudo iptables -A OUTPUT -t nat -p tcp --src --dport 80 -j REDIRECT --to-port 8080 # Redirect in LAN sudo iptables -A PREROUTING -t nat -p tcp --src --dst --dport 80 -j REDIRECT --to-port 8080 Filter # Reject with ICMP-port-unreachable. sudo iptables -A OUTPUT --dst -j REJECT # Drop and hang up the connection. sudo iptables -A OUTPUT --dst -j DROP Package flow paths Package flow paths (from iptables - Wikipedia):
Major and Minor Step The step between each note below is a half: C C#(Db) D D#(Eb) E F F#(Gb) G G#(Ab) A A#(Bb) B C Steps in Major and Minor S: a half step T: a whole step E: one and a half step Steps in Major: TTSTTTS Steps in (Natural) Minor: TSTTSTT Steps in Harmonic Minor: TSTTSES (introducing the tension resolution sound. Create a leading tone.) Steps in Melodic Minor: TSTTTTS (eliminates the gap while keeping the leading tone sound.) Majors E Major: E F# G# A B C# D# C Major: C D E F G A B Piano Piece / Sheet Notations: The Treble Clef: The Bass Clef: C scale: Circle of fifths: (source: C major - Wikipedia) More at How To Read Sheet Music: Step-by-Step Instructions More Major scale - Wikipedia Minor Scales - Natural, Harmonic
Substitution, dirname, basename and suffix Substitution can be used to get path and short filename. filename=a/b/c/name.file echo ${filename#*/} # b/c/name.file echo ${filename##*/} # name.file echo ${filename%/*} # a/b/c echo ${filename%%/*} # a But in fact, there is better way: filename=a/b/c/name.file echo $(dirname $filename) echo $(basename $filename) Still, substitution is useful for getting filename without suffix or getting suffix. echo ${filename%.*} # echo ${filename##*.} # type Parameter Expansion String selection: string=1234567890abcdefg echo ${string: 10} # abcdefg echo ${string: 10:5} # abcde echo ${string: 7:-4} # 890abc Array selection: arr=(0 1 2) arr=($
Input and Output $ ffmpeg -i input.mp4 output.mp4 Cutting Video or Audio Begin and length: $ ffmpeg -i input.mp4 -ss 00:00:10 -t 01:00:10 output.mp4 Beginning time and ending time: $ ffmpeg -i input.mp4 -ss 01:00 -to 10:00 -c copy output.mp4 "-c" is short for "-codec". Use "-codec:a" or "-codec:v" to specify audio or viedo respectively. "-c copy" suggests that the data will be copy directly to output (rather than being converted in some cases). Video Converting $ ffmpeg -i input1.flv input2.flv -framerate 30 -f mp4 -vf "scale=1280x720" output.mp4 $ ffmpeg -i input1.flv input2.flv -framerate 30 -f mp4 -vf "crop=${w}:${h}:${x}:${y}" output.mp4 "-framerate 30" fixes the framerate to 30 fps. "-f mp4" before


“另外”司机对着后视镜说,“有一件事想請您记住: 事物往往和外表不一样。" 事物往往和外表不一样。青豆在脑中重复了一遍, 微颦眉尖。“什么意思?" 司机字斟句酌地说:“就是说,现在您要去做一件非同一般的事, 不是吗? 大白天从首都高速公路的避难阶梯爬下去, 这样的事普通人一般不会做。女人尤其不会。” “大概吧。”青豆说。 “那么,一旦做了这样的事,往后的日常风景,该怎么说呢,看上去也许会和平常有点不一样,我也有过这样的经验。但是,不要被外表迷惑。现实永远只有一个。" “你知道吗,才华和直觉最大的区别是什么?” “不知道。” “区别就在于,你再怎么才华横溢,也未必能填饱肚皮;但只要你拥有敏锐的直觉,就不必担心混不上饭吃。” 小松作为编辑,直觉中的确有种特别的东西。他永远不会迷
0. install GPU drivers sudo add-apt-repository ppa:graphics-drivers/ppa sudo apt update sudo apt install nvidia-390 1. install cuda tookit and cudnn SDK (and CUPTI) Following the instructions at Installing Tensorflow on Ubuntu. # Adds NVIDIA package repository. sudo apt-key adv --fetch-keys wget wget sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb sudo dpkg -i nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb sudo apt-get update # Includes optional NCCL 2.x. sudo apt-get install cuda9.0 cuda-cublas-9-0 cuda-cufft-9-0 cuda-curand-9-0 \ cuda-cusolver-9-0 cuda-cusparse-9-0 libcudnn7=
This article is my reflection on my previous work FaceLock, a project to recognize user's face and lock the computer if the user doesn't present in a certain time. CNN is used to recognize different faces. I watch the Coursera course Convolutional Neural Networks by Andrew Ng to understand more about CNN, so it's also a learning note about it. One Layer of a Convolutional Network In a non-convolutional network, we have the following formula: Similarly, in the convolutional network, we can have: @ is a convolution operation. @ is the input matrix. @ is the filter. Different filter can detect different feature, e.g. vertical edge, diagonal edge, etc. @ is the bias. @ is a activation function. @ is the output matrix, and can be fed to the next layer. Calculating the Number The Number of the Parameters Suppose we have 10 filters which are in one layer of a neural
This article is about some squashing functions of deep learning, including Softmax Function, Sigmoid Function, and Hyperbolic Functions. All of these three functions are used to squash value to a certain range. Softmax Function Softmax Function: A generalization of the logistic function that "squashes" a K-dimensional vector z of arbitrary real values to a K-dimensional vector of real values, where each entry is in the range (0, 1], and all the entries add up to 1. In probability theory, the output of the softmax function can be used to represent a categorical distribution - that is, a probability distribution over K different possible outcomes. The softmax function is the gradient of the LogSumExp function. LogSumExp Function LogSumExp Function: The LogSumExp(LSE) function is a smooth approximation to the maximum function. ( stands for the natural logarithm function, i.e. the logarithm to the base e.) When directly encountered, LSE can be well-approximated by : Sigmoid
This article is my learning note of the coursera course Sequence Models by Andrew Yan-Tak Ng. There are two typical RNN units of the hidden layers of the RNN according to Andrew Ng. One is GRN (Gated Recurrent Unit), the other is LSTM (Long Short-Term Memory). Notice: Please refer to Mathematical Basis - Squashing Function for some basic math knowledge about the squashing functions. GRN - Gated Recurrent Unit The GRN is a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. The fully gated version : The formulas : @ : The memory cell. @ : The input sequence. @ : The output sequence. @ : Gate gamma r. It tells us how relevance is to computing the next candidate for . @ : Gate gamma u. The update gate vector. Decide whether or not we actually update , the memory cell. @ : The candidate value for the memory
Transformation Defines a struct ObjectProperty as follow: property type comment rotation vec3 The object's rotation in its object center. scale vec4 / vec3 The object's scale level in its object center. translation vec3 The object's relative position to the origin (absolute position). face vec3 Where the object look at. The code: struct ObjectProperty { vec3 rotation; vec4 scale; vec3 translation; vec3 face; }; Object Transformation Object Transformation Order: -> Rotate -> Scale -> Translate Rotate and scale around the center of the object, so that the order of these transformations will not influence the result. The transformation matrix is: struct ObjectProperty objProp; mat4 m4ModelMatrix = translate(objProp.translation) * scale(objProp.scale) * rotate(objProp.rotation); The transformation operations are: vec4 v4PosInModel = m4ModelMatrix * vertex.position; Where vertex.position is a vec4 object containing the position of the


Time Complexity With partition, the elements compared each time are as follow: n, n/2, n/2, n/4, n/4, n/4, n/4, ..., 1, 1, ..., 1 (n ones) And their sum is: n, n, n, ..., n But how much n? Since n is divided to 1 by 2, there sure is log2(n) number of n. So the time complexity is: n*log2(n) Unique-elements sorting implement From Quicksort - wikipedia: algorithm quicksort(A, lo, hi) is if lo < hi then p := partition(A, lo, hi) quicksort(A, lo, p) quicksort(A, p + 1, hi) algorithm partition(A, lo, hi) is pivot := A[lo] i := lo - 1 j := hi + 1 loop
以下皆为在开发我的 C++ 库 donnylib 的时候遇到的问题。 (一天遇到这么多 weird problems 也是很走运了) donnylib : Weird Template Subclass 如下的程序会导致无法自动推断模板类型。(编译环境: g++ 5.4.0, -std=c++17) template<class T> struct FOO { struct NOT { NOT() { } }; FOO() { sizeof(T); } }; template<class T> typename FOO<T>::NOT foo(typename FOO<T>::NOT a) { return a; } void test1() { FOO<int>::NOT f; foo(f); // template argument deduction/substitution failed. } 加上下面的代码也是不行的。 template<class T> using NOT = typename FOO<T>::NOT; template<class T> NOT<T> foo(NOT<T
Python 实现: AdaBoost - Donny-Hikari - Github Introduction AdaBoost 是 Adaptive Boosting 的简称。 Boosting 是一种 Ensemble Learning 方法。 其他的 Ensemble Learning 方法还有 Bagging, Stacking 等。 Bagging, Boosting, Stacking 的区别如下: Bagging: Equal weight voting. Trains each model with a random drawn subset of training set. Boosting: Trains each new model instance to emphasize the training instances that previous models mis-classified. Has better accuracy comparing to bagging, but also tends to overfit. Stacking: Trains a learning algorithm to combine the predictions of several other learning algorithms. The Formulas Given a N*M matrix X, and a N vector y, where N is the count of samples, and M is the features of samples. AdaBoost trains T weak classifiers with the following steps: 给定一个N*M的矩阵X(特征),和一个N维向量y(标签),N为样本数,M为特征维度。AdaBoost以一下步骤训练T个弱分类器:
Android Library To build a android library project, simply change the following line in build.gradle (Module): apply plugin: '' to: apply plugin: '' To pack the library into a jar file, add these lines to build.gradle (Module): def JarLibName="YourLibName" task clearJar(type: Delete) { delete 'build/generated/' + JarLibName + '.jar' } task makeJar(type: Jar) { from zipTree(file('build/intermediates/bundles/release/classes.jar')) baseName = JarLibName destinationDir = file("build/generated") } makeJar.dependsOn(clearJar, build) And run the following command in the Terminal: ./gradlew makeJar If failed in lint process, add these to build.gradle (Module): android { // ... lintOptions { abortOnError false } // ... } Remember to copy the jar library file



This is a learning note of Logistic Regression of Machine Learning by Andrew Ng on Coursera. Hypothesis Representation Uses the "Sigmoid Function," also called the "Logistic Function": Which turn linear regression into classification. Sigmoid function looks like this: give us the probability that the output is 1. In fact, is simplified as for logistic regression, and is for linear regression. In some complicated case, z might be something like: Decision Boundary Decision boundary is the line (or hyperplane) that separates the area where y = 0 and where y = 1 (or separates different classes). It's created by our hypothesis function. The input to the sigmoid function is not necessary to be linear, and could be a function that describes a circle (e.g. ) or any shape to fit the data. Cost Function Using the cost function for linear regression in classification will cause the output to be wavy, resulting in many local optima.
Concept 上下界网络流 顾名思义,比普通的网络流多出了下界。 考虑将下界的流量限制进行转化。将原来的边(e, v1->v2)拆边,拆出一条上界为 max-min 的边(e1),和一条上界为 min 的边(e2, 虚拟存在)。 e2 需要满足满流方才有解,但满流的同时还需要保证流量守恒(入流==出流)。故而构造虚拟源点 SS 和虚拟汇点 TT . 对于点v, 若: 下限入流等于下限出流,则两者平衡,可以认为这两个流出流入的流量都不存在。 下限入流(flowin)大于下限出流(flowout),则 SS 向 v 连一条容量为 flowin-flowout 的边,弥补不足流量。 下限入流(flowin)小于下限出流(flowout),则 v 向 TT 连一条容量为 flowout-flowin 的边,弥补过剩流量。 当且仅当 SS 的所有边都满流时有解。跑一遍 SS 到 TT 的最大流,然后验证即可。 有源汇上下界可行
Problem 合怪升级,每只怪有他的等级和攻击力,以及是否为 tuner monster. 合怪必须一只 tuner monster 与一只 non-tuner monster 合,并且他们的等级之和必须等于合成的怪。只有特定等级和攻击力的怪可以被合成。某些合成怪必须用特定的某一只或某两只怪才能合成。 Solution 看懂题目后建图跑网络流。主要难点在建图。 将 tuner monster(t1) 连到源点,将 non-tuner monster(t2) 连到汇点, capacity 均为 1, cost 均为0。并且列举将该怪与所有另一种怪的组合,以合成等级为索引保存。注意最大等级上限为12,超过12的组合直接 pass 掉。 然后处理合成怪。在合成怪对应等级中搜索对应满足条件的组合,加入到 t1, t2 矩阵中。这里有一个必须搞的优化,就是只选攻击力最大且大于两只被合成怪的合成怪,对满足的 t1,t2 连 capacity 为 1, cost 为 t1 攻击力 加 t2 攻
Introduction 最精彩的证明是自动证明。今天在知乎上看到的一位大佬对莫比乌斯反演的证明让我体会到了数论的神奇。坚定了我投靠数学的决心。 Mobius Inversion Formula 若存在 则有 反之亦然。 Basic Concept 1. Convolution / 卷积 对两个数论函数, , 卷积运算定义为 卷积运算满足: 交换律: 结合律: 存在单位元. . 不难知道定义应为: 2. 乘法单位元 普通乘法意义上的单位元,将所有自然数映射到1的函数,记为。 3. 莫比乌斯函数 在卷积意义下的逆元,称为莫比乌斯函数。即 展开写作: 定义为: 在知乎上大神引用的定义是: 这两种定义是等价的。前者是英文 Wikipedia 上给出的定义。后者可能比较好理解一些。 Prove 莫比乌斯反演写作卷积形式: 证明: 反之 Reference 参考资料: [1] 知乎 - 如何证明莫
Problem 给定数字串 S,求如下表达式的值: Solution 用前缀和转化该式。令 , 则 . 原式转化为: 二项展开,得: 将内外求和对调,使得可以预处理前缀和的和的p次方。式子化为: 预处理 , 复杂度 O(N*K). 对 , 处理每个答案,复杂度也是 O(N*K). 坑点: 每个case末尾输出空格PE. 最后不输出换行会WA... 于是我从PE改到了WA... Code #include <cstdio> #include <cstring> typedef long long LL; const LL MOD=1e9+7; const int MAXN=5e4+10; char S[MAXN]; LL pre[MAXN]; LL spre[MAXN][110]; LL ppre[MAXN][110]; LL C[110][110]; void init() { C[0][0]=1; for (int i=1;i<=100;++i) { C[0
最近(上个月)想研究一下人脸识别。人脸检测。找了下网上的资料,有不少都是介绍怎么用现成的模块识别的。但是我想了解的是用神经网络进行人脸识别,并且希望能够更多地接触神经网络。于是往基于TensorFlow框架的人脸识别方面查找了下,幸运地找到了 Hironsan 的 BossSensor 项目。学习他的代码,我了解到了 Keras 框架搭建神经网络的步骤。在此基础上进行了一系列改进,获得了更高的精度。 Main Produce 预处理 -> 数据集划分 -> 人脸图像输入 -> 卷积神经网络 -> 分类输出 -> 决策 Preprocess 首先,获取人脸图像数据。我是通过 OpenCV 采集照片中的人脸。照片来源:现场取材,身边人下手,手机照片,网络收集。(直到我做完才想起来有个 ImageNet 这种方便的东西。) 然后是读取图像数据,将人脸图像预处理,将图片缩放和增加 Padding, 使得图片的长宽像素一致