Hello, welcome to... Wait, stop clicking my avatar!



Transformer Attention is all you need - Arxiv Feed Forward: Two fully connected layers (Linear) with a ReLU activation in between. Multi-head Attention: Attention Attention - Qiita Self-Attention GPT (Generative Pre-Training) GPT - paper BERT (Bidirectional Encoder Representations from Transformers) BERT - Arxiv BERT Explained Different from the Transformer (GPT) which is trained using only the left context, BERT uses bidirectional encoder which makes use of both the left and the right context. [MASK] is used to mask some of the words so that the model will not see the word itself indirectly. Pre-training of BERT makes use of two strategies: MLM (Masked Language Model) and NSP (Next Sentence Prediction). The model is trained with both the strategies together. As shown below, the input embeddings of BERT consists of the token embeddings, the segment embeddings, and the position embeddings. Note that a segment may consists of multiple sentences. In MLM task,
Tic-Tac-Toe Online Server Base on the Tic-Tac-Toe Game of CS188, Berkeley, I develop an online version of Tic-Tac-Toe. Now your agent can play with my agent online! I think it is a good way to check whether our agents are optimal or not. My agent can beat random agents most of the time even if my agent is the second player. Online Server Website: Tic-Tac-Toe Online Download the attached client file from the moodle form and place it in the same directory of your ( depends on , so is also needed). Run this command: $ python3 -u demo -n 3 And Enjoy! Notice: You need to specify a username with "-u USERNAME". Don't use "demo" as your username cause it is forbidden. Usage: [options] Options Variable Description -u USERNAME Username, must not be empty nor
The Question Fish eating fruit on jisuanke Given an undirected acyclic graph G, all possible path P in the graph, calculate: The first taste In the contest, a handsome foriegn teammate conviced me that this problem can be solve using LCA. I tried. And it did work, with the help of dijkstra. My solution is to, first of all, run dijkstra, and get the distance between root node and every other nodes. Then, calculate the LCA for every two nodes. The desired result is: It worked, but we got TLE for looping though all the nodes, which is . The second trial After the contest, I was told that this is a DP problem. You calculate the times an edge is accessed, times it with the weight, sum them up by the modulus of 3, you got the result. This one, however, also got TLE. Oh, FISH! The final solution The reason why the second solution still can
Overview Regions with CNN features: Efficient Graph Based Image Segmentation use disjoint set to speed up merge operation Selective Search HOG (Histogram of Oriented Gradient) Multiple criterions (color, texture, size, shape) to merge regions AlexNet/VGG16 R-CNN Notice that many descriptions are replicated from the orignal sources directly. Some Fundermental Conceptions Batch Size Stochastic Gradient Descent. Batch Size = 1 Batch Gradient Descent. Batch Size = Size of Training Set Mini-Batch Gradient Descent. 1 < Batch Size < Size of Training Set Regularization A regression model that uses L1 Regularization technique is called Lasso Regression and model which uses L2 is called Ridge Regression. Ridge Regularization Ridge regression adds "squared magnitude" of coefficient as penalty term to the loss function. The first sum is an example of loss function. Lasso Regularization Lasso Regression (Least Absolute Shrinkage and Selection Operator) adds "absolute value of magnitude" of coefficient as penalty term to the loss function.
Useful Materials Distinctive Image Features from Scale-Invariant Keypoints[1] by David G. Lowe. SIFT(Scale-Invariant Feature Transform)[2] on Towards Data Science. The SIFT (Scale Invariant Feature Transform) Detector and Descriptor[3]. Notes Uses DoG (Difference of Gaussian) to approximate Scale-normalized LoG (Laplacian of Gaussian)[4]. where is the two dimensions Gaussian function, and is the input image. [need more consideration] After each octave, the Gaussian image is down-sampled by a factor of 2, by resampling the Gaussian image that has twice the initial value of by taking every second pixel in each row and column. And we start on the new octave with . Since the image size is reduced to 1/4, the sigma for the next octave becomes , which is equal to . To understand it, frist consider this question: If the image size is reduced to 1\4, but the kernel size of
Types of Noise Additive noise Additive noise is independent from image signal. The image g with nosie can be considered as the sum of ideal image f and noise n.[1] Multiplicative noise Multifplicative noise is often dependent on image signal. The relation of image and noise is[1]: Gaussian noise Gaussian noise, named after Carl Friedrich Gauss, is statistical noise having a probability density function (PDF) equal to that of the normal distribution, aka. the Gaussian distribution. i.e. the values that the noise can take on are Gaussian-distributed. The PDF of a Gaussian random variable is given by[2]: Salt-and-pepper noise Fat-tail distributed or "impulsive" noise is sometimes called salt-and-pepper nosie or spike noise. An image containing salt-and-pepper noise will have dark pixels in bright regions and bright pixels in dark regions.[2] The PDF of (Bipolar) Impulse noise is given by: if b > a, gray-level

Greek Alphabet

Greek Alphabet Letter (Capital Case) Letter (Lower Case) Name English Equivalent alpha a beta b gamma g delta d epsilon e zeta z eta h theta th iota i kappa k lambda l mu m nu n xi x omicron o pi p rho r sigma s tau t upsilon u phi ph chi ch psi ps omega o References [1]. Greek Alphabet - Wikipedia [2]. Greek alphabet letters & symbols
Information Content where, I : the information content of X. An X with greater I value contains more information. P : the probability mass function. : b is the base of the logarithm used. Common values of b are 2 (bits), Euler's number e (nats), and 10 (bans). Entropy (Information Theory)[1] where, H : the entropy H. Named after Boltzmann's H-theorem (but the definition is proposed by Shannon). H indicates the uncertainty of X. P : probability mass function. I : the information content of X. E : the expected value operator. The entropy can explicitly be written as: ID3[2] Use ID3 to build a decision tree: Calculate the entropy of the samples under the current node. Find a feature F that can maximize the information gain. The information gain is calculatd by: where E is the entropy of
AutoTag AutoTag is a program that generate tags for documents automatically. The main process includes: Participle (N-gram + lookup in dictionary) Generate bag-of-words for each document. Calculate term frequency and inverse document frequency. Pick top x words with greater tf-idf values as tags. N-gram N-gram generate a sequence of n words in every position of a sentence.[1] sentences = 'Lucy like to listen to music. Luna like music too.' items = ngram(sentences, 2) print(items) # output: [ 'Lucy like', 'like to', 'to listen', 'listen to', 'to music', 'Luna like', 'like music', 'music too', ] Bag of words The bag-of-words model is a simplifying representation in NLP and IR.[1] N-gram Count the times that each word appears


w3m: WWW wo Miru (c) Copyright Akinori ITO w3m is a pager with WWW capability. It IS a pager, but it can be used as a text-mode WWW browser. Keyboard Shortcuts Shortcut Action Level H Help Brower q Quit w3m Brower C-h History Brower T New tab Tabs C-j Open link on current tab Tabs C-t Open link on new tab Tabs C-q Close tab Tabs U Go to url Page R Reload Page B Back Page Configuration File Location Keymap file ~/.w3m/keymap