Skip-gram model is to find word representations that are useful for predicting the surrounding words in a sentence or a document
given a sequence of training words w1, w2, w3, . . . , wT , the objective of the Skip-gram model is to maximize the average log probability
Hierarchical Softmax
Negative Sampling
Noise Contrastive Estimation
differentiate data from noise by means of logistic regression
Distributed Representations of Words and Phrases and their Compositionality