首页 > 其他 > 详细

word2vec:基本的安装及使用简介

时间:2017-02-18 14:17:10      阅读:1067      评论:0      收藏:0      [点我收藏+]

官方word2vec的github下载地址:https://github.com/svn2github/word2vec

环境,linux-ubuntu-14.04LST,安装好git, gcc版本4.8.4

linux下的安装方式:

% git clone https://github.com/svn2github/word2vec.git

% cd word2vec

% make

命令解析:

-train <file>
  Use text data from <file> to train the model
-output <file>
  Use <file> to save the resulting word vectors / word clusters
-size <int>
  Set size of word vectors; default is 100
-window <int>
  Set max skip length between words; default is 5
-sample <float>
  Set threshold for occurrence of words. Those that appear with higher frequency in the training data
  will be randomly down-sampled; default is 1e-3, useful range is (0, 1e-5)
-hs <int>
  Use Hierarchical Softmax; default is 0 (not used)
-negative <int>
  Number of negative examples; default is 5, common values are 3 - 10 (0 = not used)
-threads <int>
  Use <int> threads (default 12)
-iter <int>
  Run more training iterations (default 5)
-min-count <int>
  This will discard words that appear less than <int> times; default is 5
-alpha <float>
  Set the starting learning rate; default is 0.025 for skip-gram and 0.05 for CBOW
-classes <int>
  Output word classes rather than word vectors; default number of classes is 0 (vectors are written)
-debug <int>
  Set the debug mode (default = 2 = more info during training)
-binary <int>
  Save the resulting vectors in binary moded; default is 0 (off)
-save-vocab <file>
  The vocabulary will be saved to <file>
-read-vocab <file>
  The vocabulary will be read from <file>, not constructed from the training data
-cbow <int>
  Use the continuous bag of words model; default is 1 (use 0 for skip-gram model)

之后准备训练预料就可以了,将分词后的文件拼成一行,训练即可,

./word2vec -train fudan_corpus_final -output fudan_100_skip.bin -cbow 0 -size 100 -windows 10 -negative 5 -hs 0 -binary 1 -sample 1e-4 -threads 20 -iter 15

word2vec:基本的安装及使用简介

原文:http://www.cnblogs.com/ooon/p/6413065.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!