$ wget http://mirrors.shu.edu.cn/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz
$ tar xvf spark-2.4.0-bin-hadoop2.7.tgz
$ cd spark-2.4.0-bin-hadoop2.7
$ export SPARK_HOME=/path/to/spark-2.4.0-bin-hadoop2.7
以spark-sql为例
只需要配置环境变量 HADOOP_CONF_DIR
$ bin/spark-sql --master yarn
更多参数
--deploy-mode cluster
--driver-memory 4g
--driver-cores 1
--executor-memory 2g
--executor-cores 1
--num-executors 1
--queue thequeue
$ bin/spark-sql --master mesos://zk://172.19.28.186:2181,172.19.28.188:2181,172.19.28.190:2181/mesos
更多参数
--deploy-mode cluster
--supervise
--executor-memory 20G
--executor-cores 1
--total-executor-cores 100
注意此时没有--num-executors参数,间接配置方法 --num-executors = --total-executor-cores / --executor-cores
Executor memory: spark.executor.memory
Executor cores: spark.executor.cores
Number of executors: spark.cores.max/spark.executor.cores
注意:spark on yarn 有可能启动报错
19/02/25 17:54:20 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!
查看nodemanager日志发现原因
2019-02-25 17:54:19,481 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=48342,containerID=container_1551078668160_0012_02_000001] is running beyond virtual memory limits. Current usage: 380.9 MB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container.
需要调整yarn-site.xml配置
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
or
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
</property>
【原创】大数据基础之Spark(9)spark部署方式yarn/mesos
原文:https://www.cnblogs.com/barneywill/p/10432581.html