系统:centos6.2
节点数目:1个master,16个worker
spark版本:0.8.0
内核版本:2.6.32
以下是遇到的问题及解决办法:
原因:不明
解决办法:重启,重新连接。
原因:关闭集群后的worker节点上tasktracker节点没关掉。
解决办法:关闭集群后,手动找到worker节点上的tasktracker进程并杀死
解决办法:执行sbt/sbt clean assembly
cd $SPARK_HOME
./run-example org.apache.spark.examples.SparkPi spark://hw024:7077
标准输出结果:
……
14/03/20 11:13:02 INFO scheduler.DAGScheduler: Stage 0 (reduce at SparkPi.scala:39) finished in 1.642 s
14/03/20 11:13:02 INFO cluster.ClusterScheduler: Remove TaskSet 0.0 from pool
14/03/20 11:13:02 INFO spark.SparkContext: Job finished: reduce at SparkPi.scala:39, took 1.708775428 s
Pi is roughly 3.13434
但是查看workernode的/home/zhangqianlong/spark-0.8.0-incubating-bin-hadoop1/work/app-20140320111300-0008/8/stderr内容如下:
Spark Executor Command: "java" "-cp" ":/home/zhangqianlong/spark-0.8.0-incubating-bin-hadoop1/conf:/home/zhangqianlong/spark-0.8.0-incubating-bin-hadoop1/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.StandaloneExecutorBackend" "akka://spark@hw024:60929/user/StandaloneScheduler" "8" "hw018" "24"
====================================
14/03/20 11:05:15 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
14/03/20 11:05:15 INFO executor.StandaloneExecutorBackend: Connecting to driver: akka://spark@hw024:60929/user/StandaloneScheduler
14/03/20 11:05:15 INFO executor.StandaloneExecutorBackend: Successfully registered with driver
14/03/20 11:05:15 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
14/03/20 11:05:15 INFO spark.SparkEnv: Connecting to BlockManagerMaster: akka://spark@hw024:60929/user/BlockManagerMaster
14/03/20 11:05:15 INFO storage.MemoryStore: MemoryStore started with capacity 323.9 MB.
14/03/20 11:05:15 INFO storage.DiskStore: Created local directory at /tmp/spark-local-20140320110515-9151
14/03/20 11:05:15 INFO network.ConnectionManager: Bound socket to port 59511 with id = ConnectionManagerId(hw018,59511)
14/03/20 11:05:15 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/03/20 11:05:15 INFO storage.BlockManagerMaster: Registered BlockManager
14/03/20 11:05:15 INFO spark.SparkEnv: Connecting to MapOutputTracker: akka://spark@hw024:60929/user/MapOutputTracker
14/03/20 11:05:15 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-81a80beb-fd56-4573-9afe-ca9310d3ea8d
14/03/20 11:05:15 INFO server.Server: jetty-7.x.y-SNAPSHOT
14/03/20 11:05:15 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:56230
14/03/20 11:05:16 ERROR executor.StandaloneExecutorBackend: Driver terminated or disconnected! Shutting down.
这个问题困扰我一周了,F**K!
经过多次与其他攻城狮讨论,该问题不用理会,只要正常运行结束,并且hadoop fs -cat /XX/part-XXX有输出结果,结果是想要的就行。我猜是因为延时配置问题。
原因:迭代计算时打开太多临时文件
解决办法:修改所有节点的系统打开文件上限设置:/etc/security/limits.conf(注意,不能用ssh远程登陆后先删除再拷贝,会导致系统无法登陆)
重启spark后生效就可以解决
【解决】centos6.2 spark cluster问题(持续追加),布布扣,bubuko.com
【解决】centos6.2 spark cluster问题(持续追加)
原文:http://blog.csdn.net/qianlong4526888/article/details/22899355