首页 > 其他 > 详细

执行spark-submit时出错

时间:2019-02-22 10:44:13      阅读:475      评论:0      收藏:0      [点我收藏+]

执行任务如下:

# ./spark-submit --class org.apache.spark.examples.SparkPi  /hadoop/spark/examples/jars/spark-examples_2.11-2.4.0.jar 100  

报错如下:

2019-02-22 09:56:26 INFO  StandaloneAppClient$ClientEndpoint:54 - Executor updated: app-20190222015626-0020/1 is now RUNNING
2019-02-22 09:56:26 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, kvm-test, 36768, None)
2019-02-22 09:56:26 INFO  BlockManagerMasterEndpoint:54 - Registering block manager kvm-test:36768 with 366.3 MB RAM, BlockManagerId(driver, kvm-test, 36768, None)
2019-02-22 09:56:26 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, kvm-test, 36768, None)
2019-02-22 09:56:26 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, kvm-test, 36768, None)
2019-02-22 09:56:26 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5aae8eb5{/metrics/json,null,AVAILABLE,@Spark}
2019-02-22 09:56:27 INFO  EventLoggingListener:54 - Logging events to hdfs://hadoop-cluster/spark/eventLog/app-20190222015626-0020.snappy
2019-02-22 09:56:27 INFO  StandaloneSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
2019-02-22 09:56:28 INFO  SparkContext:54 - Starting job: reduce at SparkPi.scala:38
2019-02-22 09:56:28 INFO  DAGScheduler:54 - Got job 0 (reduce at SparkPi.scala:38) with 100 output partitions
2019-02-22 09:56:28 INFO  DAGScheduler:54 - Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
2019-02-22 09:56:28 INFO  DAGScheduler:54 - Parents of final stage: List()
2019-02-22 09:56:28 INFO  DAGScheduler:54 - Missing parents: List()
2019-02-22 09:56:28 INFO  DAGScheduler:54 - Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
2019-02-22 09:56:28 INFO  StandaloneAppClient$ClientEndpoint:54 - Executor updated: app-20190222015626-0020/1 is now EXITED (Command exited with code 1)
2019-02-22 09:56:28 INFO  StandaloneSchedulerBackend:54 - Executor app-20190222015626-0020/1 removed: Command exited with code 1
2019-02-22 09:56:28 INFO  StandaloneAppClient$ClientEndpoint:54 - Executor added: app-20190222015626-0020/2 on worker-20190111083714-172.20.1.1-45882 (172.20.1.1:45882) with 1 core(s)
2019-02-22 09:56:28 INFO  StandaloneSchedulerBackend:54 - Granted executor ID app-20190222015626-0020/2 on hostPort 172.20.1.1:45882 with 1 core(s), 512.0 MB RAM
2019-02-22 09:56:28 INFO  StandaloneAppClient$ClientEndpoint:54 - Executor updated: app-20190222015626-0020/2 is now RUNNING
2019-02-22 09:56:28 INFO  BlockManagerMaster:54 - Removal of executor 1 requested
2019-02-22 09:56:28 INFO  CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asked to remove non-existent executor 1
2019-02-22 09:56:28 INFO  BlockManagerMasterEndpoint:54 - Trying to remove executor 1 from BlockManagerMaster.
2019-02-22 09:56:28 INFO  MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 366.3 MB)
2019-02-22 09:56:28 INFO  MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 1236.0 B, free 366.3 MB)
2019-02-22 09:56:28 INFO  BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on kvm-test:36768 (size: 1236.0 B, free: 366.3 MB)
2019-02-22 09:56:28 INFO  SparkContext:54 - Created broadcast 0 from broadcast at DAGScheduler.scala:1161
2019-02-22 09:56:28 INFO  DAGScheduler:54 - Submitting 100 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
2019-02-22 09:56:28 INFO  TaskSchedulerImpl:54 - Adding task set 0.0 with 100 tasks
2019-02-22 09:56:29 INFO  StandaloneAppClient$ClientEndpoint:54 - Executor updated: app-20190222015626-0020/2 is now EXITED (Command exited with code 1)
2019-02-22 09:56:29 INFO  StandaloneSchedulerBackend:54 - Executor app-20190222015626-0020/2 removed: Command exited with code 1
2019-02-22 09:56:29 INFO  StandaloneAppClient$ClientEndpoint:54 - Executor added: app-20190222015626-0020/3 on worker-20190111083714-172.20.1.1-45882 (172.20.1.1:45882) with 1 core(s)
2019-02-22 09:56:29 INFO  BlockManagerMaster:54 - Removal of executor 2 requested
2019-02-22 09:56:29 INFO  CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asked to remove non-existent executor 2
2019-02-22 09:56:29 INFO  StandaloneSchedulerBackend:54 - Granted executor ID app-20190222015626-0020/3 on hostPort 172.20.1.1:45882 with 1 core(s), 512.0 MB RAM
2019-02-22 09:56:29 INFO  BlockManagerMasterEndpoint:54 - Trying to remove executor 2 from BlockManagerMaster.
2019-02-22 09:56:29 INFO  StandaloneAppClient$ClientEndpoint:54 - Executor updated: app-20190222015626-0020/3 is now RUNNING
2019-02-22 09:56:31 INFO  StandaloneAppClient$ClientEndpoint:54 - Executor updated: app-20190222015626-0020/3 is now EXITED (Command exited with code 1)
2019-02-22 09:56:31 INFO  StandaloneSchedulerBackend:54 - Executor app-20190222015626-0020/3 removed: Command exited with code 1
2019-02-22 09:56:31 INFO  BlockManagerMasterEndpoint:54 - Trying to remove executor 3 from BlockManagerMaster.
2019-02-22 09:56:31 INFO  BlockManagerMaster:54 - Removal of executor 3 requested
2019-02-22 09:56:31 INFO  CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asked to remove non-existent executor 3
2019-02-22 09:56:31 INFO  StandaloneAppClient$ClientEndpoint:54 - Executor added: app-20190222015626-0020/4 on worker-20190111083714-172.20.1.1-45882 (172.20.1.1:45882) with 1 core(s)
2019-02-22 09:56:31 INFO  StandaloneSchedulerBackend:54 - Granted executor ID app-20190222015626-0020/4 on hostPort 172.20.1.1:45882 with 1 core(s), 512.0 MB RAM
2019-02-22 09:56:31 INFO  StandaloneAppClient$ClientEndpoint:54 - Executor updated: app-20190222015626-0020/4 is now RUNNING
2019-02-22 09:56:33 INFO  StandaloneAppClient$ClientEndpoint:54 - Executor updated: app-20190222015626-0020/4 is now EXITED (Command exited with code 1)
2019-02-22 09:56:33 INFO  StandaloneSchedulerBackend:54 - Executor app-20190222015626-0020/4 removed: Command exited with code 1
2019-02-22 09:56:33 INFO  BlockManagerMasterEndpoint:54 - Trying to remove executor 4 from BlockManagerMaster.
2019-02-22 09:56:33 INFO  BlockManagerMaster:54 - Removal of executor 4 requested  

从报错看出来,,任务一直在请求,但是executor莫名退出了,日志后面还有一个警告,如下:

2019-02-22 09:42:58 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

分析:从这个信息可以看出来,,task没有获取到资源

解决:

第一种情况:资源不足(可能是CPU,也可能是内存),这种情况可以调整内存(driver或者executor)或者CPU大小

例如,按如下调整,很多情况都是executor内存设置的过大,超出了实际的内存大小

# ./spark-submit --class org.apache.spark.examples.SparkPi  --executor-memory 512M --total-executor-cores 2 --driver-memory 512M /hadoop/spark/examples/jars/spark-examples_2.11-2.4.0.jar 100   

第二种情况:也是我遇到的。我有一个spark集群+一个spark客户端,我在spark集群里面执行任务可以正常执行,但是放到spark客户端执行的时候就报错了。机器内存,cpu都足够大。导致错误的原因竟然是主机名和ip对应出错了,

由于spark集群是以前搭建的,今天做了一个spark,忘记在spark集群里面添加spark客户端的主机和ip映射了。添加上好了。

 

总结:

出现这类问题一般有几个可能的原因,逐一检查排除即可:
 
1.因为提交任务的节点不能和worker节点交互,因为提交完任务后提交任务节点上会起一个进程,展示任务进度,大多端口为4044,工作节点需要反馈进度给该该端口,所以如果主机名或者IP在hosts中配置不正确。所以检查下主机名和ip是否配置正确。

2.也有可能是内存不足造成的。内存设置可以根据情况调整下。另外,也检查下web UI看看,确保worker节点处于alive状态。 

 

执行spark-submit时出错

原文:https://www.cnblogs.com/yjt1993/p/10416781.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!