在Hadoop集群环境下安装Mahout。
环境:OS:Centos 6.5 x64 & Soft:Hadoop 1.2.1 & Mahout
0.9
1、简介
mahout项目主页:https://mahout.apache.org/
下载二进制包,上传到服务器。
2、安装
用集群环境用户安装,解压二进制包。
[huser@master hadoop]$ tar -xvf mahout-distribution-0.9.tar.gz
3、配置环境变量
export JAVA_HOME=/usr/java/jdk1.7.0_51
export
CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib/tools.jar
export
JRE_HOME=$JAVA_HOME/jre
export
HADOOP_HOME=/home/huser/hadoop/hadoop-1.2.1
export
HADOOP_CONF_DIR=/home/huser/hadoop/hadoop-1.2.1/conf
export
HADOOP_CLASSPATH=/home/huser/hadoop/hadoop-1.2.1/bin
export
MAHOUT_HOME=/home/huser/hadoop/mahout-distribution-0.9
export
MAHOUT_HOME_DIR=/home/huser/hadoop/mahout-distribution-0.9/conf
export
PATH=$PATH:$JAVA_HOME/bin:$MAHOUT_HOME/bin:$MAHOUT_HOME/conf
4、测试
[huser@master ~]$ mahout
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR
to classpath.
Warning: $HADOOP_HOME is deprecated.
Running on hadoop, using /home/huser/hadoop/hadoop-1.2.1/bin/hadoop and
HADOOP_CONF_DIR=/home/huser/hadoop/hadoop-1.2.1/conf
MAHOUT-JOB:
/home/huser/hadoop/mahout-distribution-0.9/mahout-examples-0.9-job.jar
Warning:
$HADOOP_HOME is deprecated.
An example program must be given as the first argument.
Valid program
names are:
arff.vector: : Generate Vectors from an ARFF file or
directory
baumwelch: : Baum-Welch algorithm for unsupervised HMM
training
canopy: : Canopy clustering
cat: : Print a file or resource
as the logistic regression models would see it
cleansvd: : Cleanup and
verification of SVD output
clusterdump: : Dump cluster output to text
clusterpp: : Groups Clustering Output In Clusters
cmdump: : Dump confusion
matrix in HTML or text formats
concatmatrices: : Concatenates 2 matrices of
same cardinality into a single matrix
cvb: : LDA via Collapsed Variation
Bayes (0th deriv. approx)
cvb0_local: : LDA via Collapsed Variation Bayes,
in memory locally.
evaluateFactorization: : compute RMSE and MAE of a
rating matrix factorization against probes
fkmeans: : Fuzzy K-means
clustering
hmmpredict: : Generate random sequence of observations by given
HMM
itemsimilarity: : Compute the item-item-similarities for item-based
collaborative filtering
kmeans: : K-means clustering
lucene.vector: :
Generate Vectors from a Lucene index
lucene2seq: : Generate Text
SequenceFiles from a Lucene index
matrixdump: : Dump matrix in CSV
format
matrixmult: : Take the product of two matrices
parallelALS: :
ALS-WR factorization of a rating matrix
qualcluster: : Runs clustering
experiments and summarizes results in a CSV
recommendfactorized: : Compute
recommendations using the factorization of a rating matrix
recommenditembased: : Compute recommendations using item-based collaborative
filtering
regexconverter: : Convert text files on a per line basis based on
regular expressions
resplit: : Splits a set of SequenceFiles into a number
of equal splits
rowid: : Map SequenceFile<Text,VectorWritable> to
{SequenceFile<IntWritable,VectorWritable>,
SequenceFile<IntWritable,Text>}
rowsimilarity: : Compute the pairwise
similarities of the rows of a matrix
runAdaptiveLogistic: : Score new
production data using a probably trained and validated
AdaptivelogisticRegression model
runlogistic: : Run a logistic regression
model against CSV data
seq2encoded: : Encoded Sparse Vector generation from
Text sequence files
seq2sparse: : Sparse Vector generation from Text
sequence files
seqdirectory: : Generate sequence files (of Text) from a
directory
seqdumper: : Generic Sequence File dumper
seqmailarchives: :
Creates SequenceFile from a directory containing gzipped mail archives
seqwiki: : Wikipedia xml dump to sequence file
spectralkmeans: : Spectral
k-means clustering
split: : Split Input data into test and train sets
splitDataset: : split a rating dataset into training and probe parts
ssvd:
: Stochastic SVD
streamingkmeans: : Streaming k-means clustering
svd:
: Lanczos Singular Value Decomposition
testnb: : Test the Vector-based
Bayes classifier
trainAdaptiveLogistic: : Train an
AdaptivelogisticRegression model
trainlogistic: : Train a logistic
regression using stochastic gradient descent
trainnb: : Train the
Vector-based Bayes classifier
transpose: : Take the transpose of a
matrix
validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression
model against hold-out data set
vecdist: : Compute the distances between a
set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of
Vectors
vectordump: : Dump vectors from a sequence file to text
viterbi: : Viterbi decoding of hidden states from given output states
sequence
原文:http://www.cnblogs.com/guarder/p/3704981.html