和分布式文件系统和NoSQL数据库相比而言,spark集群的安装配置还算是比较简单的:
- 安装JDK,这个几乎不用介绍了(很多软件都需要JDK嘛)
wget http://download.oracle.com/otn-pub/java/jdk/7u71-b14/jdk-7u71-linux-x64.tar.gz?AuthParam=1416666050_dca8969bfc01e3d8d42d04040f76ff1
tar -zxvf jdk-7u71-linux-x64.tar.gz
- 安装scala,网上建议用2.9版本:
wget http://www.scala-lang.org/files/archive/scala-2.9.1.final.tgz
tar -zxvf scala-2.9.1.final.tgz
ln -n scala-2.9.1.final scala
- 设置环境变量,vi /etc/profile
export JAVA_HOME=/usr/local/java
export SCALA_HOME=/usr/local/scala
- 安装spark:
wget http://mirror.bit.edu.cn/apache/spark/spark-1.1.0/spark-1.1.0-bin-hadoop2.3.tgz
tar -zxvf spark-1.1.0-bin-hadoop2.3.tgz
ln -s spark-1.1.0-bin-hadoop2.3 spark
- 执行测试程序:
cd /usr/local/spark/bin
./spark-shell
输入:
scala> val data = Array(1, 2, 3, 4, 5)
data: Array[Int] = Array(1, 2, 3, 4, 5)
scala> val distData = sc.parallelize(data)
distData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:14
scala> distData.reduce(_+_)
- 可以观察4040端口:
spark1.1.0集群安装配置
原文:http://blog.csdn.net/bluejoe2000/article/details/41391407