TF-IDF
1

2019

1

AutoTag AutoTag is a program that generate tags for documents automatically. The main process includes: Participle (N-gram + lookup in dictionary) Generate bag-of-words for each document. Calculate term frequency and inverse document frequency. Pick top x words with greater tf-idf values as tags. N-gram N-gram generate a sequence of n words in every position of a sentence.[1] sentences = 'Lucy like to listen to music. Luna like music too.' items = ngram(sentences, 2) print(items) # output: [ 'Lucy like', 'like to', 'to listen', 'listen to', 'to music', 'Luna like', 'like music', 'music too', ] Bag of words The bag-of-words model is a simplifying representation in NLP and IR.[1] N-gram Count the times that each word appears