生产环境的搭建
Hadoop生产环境的配置 主机规划这里我们使用5 台主机来配置Hadoop集群。
Journalnode和ZooKeeper保持奇数个,这点大家要有个概念,最少不少于 3 个节点。Zookeeper课程中我们已经讲解过,这里就不再赘叙。 软件规划
用户规划每个节点的hadoop用户组和用户需要大家自己创建,单节点已经讲过,这里就不耽误大家时间。
目录规划
时钟同步(注:每一次重新启动集群时,时钟基本上是不同步的,所以都需要同步时钟) 查看当前系统时间 date Tue Nov 3 06:06:04 CST 2015 如果系统时间与当前时间不一致,进行以下操作。 [root@djt11 ~]# cd /usr/share/zoneinfo/ [root@djt11 zoneinfo]# ls //找到Asia [root@djt11 zoneinfo]# cd Asia/ //进入Asia目录 [root@djt11 Asia]# ls //找到Shanghai [root@djt11 Asia]# cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime //当前时区替换为上海 我们可以同步当前系统时间和日期与NTP(网络时间协议)一致。 [root@djt11 Asia]# yum install ntp //如果ntp命令不存在,在线安装ntp [root@djt11 Asia]# ntpdate pool.ntp.org //执行此命令同步日期时间 [root@djt11 Asia]# date //查看当前系统时间 注 :在桥接模式下,上述同步时钟的方法行不通。换一下方法,在xshell中,全部xshell会话的方式的方式同时更改所有节点。 hosts文件检查 所有节点的hosts文件都要配置静态ip与hostname之间的对应关系。 [root@djt11 Asia]# vi /etc/hosts 192.168.3.11 djt11 192.168.3.12 djt12 192.168.3.13 djt13 192.168.3.14 djt14 192.168.3.15 djt15 禁用防火墙 所有节点的防火墙都要关闭。 查看防火墙状态 [root@djt11 Asia]# service iptables status iptables: Firewall is not running. 如果不是上面的关闭状态,则需要关闭防火墙。 [root@djt11 Asia]# chkconfig iptables off //永久关闭防火墙 [root@djt11 Asia]# service iptables stop //临时关闭防火墙 配置SSH免密码通信 这里我们以djt11来配置ssh。 [root@djt11 ~]# su hadoop //切换到hadoop用户下 [hadoop@djt11 root]$ cd //切换到hadoop用户目录 [hadoop@djt11 ~]$ mkdir .ssh [hadoop@djt11 ~]$ ssh-keygen -t rsa //执行命令一路回车,生成秘钥 [hadoop@djt11 ~]$cd .ssh [hadoop@djt11 .ssh]$ ls id_rsa id_rsa.pub [hadoop@djt11 .ssh]$ cat id_ras.pub >> authorized_keys //将公钥保存到authorized_keys认证文件中 [hadoop@djt11 .ssh]$ ls authorized_keys id_rsa id_rsa.pub [hadoop@djt11 .ssh]$ cd .. [hadoop@djt11 ~]$ chmod 700 .ssh [hadoop@djt11 ~]$ chmod 600 .ssh/* [hadoop@djt11 ~]$ ssh djt11 //第一次执行需要输入yes [hadoop@djt11 ~]$ ssh djt11 //第二次以后就可以直接访问 集群所有节点都要行上面的操作。 将所有节点中的共钥id_ras.pub拷贝到djt11中的authorized_keys文件中。 [hadoop@djt12 ~]$ cat ~/.ssh/id_rsa.pub | ssh hadoop@djt11 ‘cat >> ~/.ssh/authorized_keys‘ 所有节点都需要执行这条命令 然后将djt11中的authorized_keys文件分发到所有节点上面。 [hadoop@djt11 ~ .ssh]$ scp -r authorized_keys hadoop@djt12:~/.ssh/ [hadoop@djt11 ~ .ssh]$ scp -r authorized_keys hadoop@djt13:~/.ssh/ [hadoop@djt11 ~ .ssh]$ scp -r authorized_keys hadoop@djt14:~/.ssh/ [hadoop@djt11 ~ .ssh]$ scp -r authorized_keys hadoop@djt15:~/.ssh/ 大家通过ssh 相互访问,如果都能无密码访问,代表ssh配置成功。 脚本工具的使用 在djt11节点上创建/home/hadoop/tools目录。 [hadoop@djt11 ~]$ mkdir /home/hadoop/tools [hadoop@djt11 ~]$cd /home/hadoop/tools 将本地脚本文件上传至/home/hadoop/tools目录下。这些脚本大家如果能看懂也可以自己写, 如果看不懂直接使用就可以,后面慢慢补补Linux相关的知识。 [hadoop@djt11 tools]$ rz deploy.conf [hadoop@djt11 tools]$ rz deploy.sh [hadoop@djt11 tools]$ rz runRemoteCmd.sh [hadoop@djt11 tools]$ ls deploy.conf deploy.sh runRemoteCmd.sh 查看一下deploy.conf配置文件内容。 [hadoop@djt11 tools]$ cat deploy.conf djt11,all,namenode,zookeeper,resourcemanager, djt12,all,slave,namenode,zookeeper,resourcemanager, djt13,all,slave,datanode,zookeeper, djt14,all,slave,datanode,zookeeper, djt15,all,slave,datanode,zookeeper, 查看一下deploy.sh远程复制文件脚本内容。 [hadoop@djt11 tools]$ cat deploy.sh #!/bin/bash #set -x
if [ $# -lt 3 ] then echo "Usage: ./deply.sh srcFile(or Dir) descFile(or Dir) MachineTag" echo "Usage: ./deply.sh srcFile(or Dir) descFile(or Dir) MachineTag confFile" exit fi
src=$1 dest=$2 tag=$3 if [ ‘a‘$4‘a‘ == ‘aa‘ ] then confFile=/home/hadoop/tools/deploy.conf else confFile=$4 fi
if [ -f $confFile ] then if [ -f $src ] then for server in `cat $confFile|grep -v ‘^#‘|grep ‘,‘$tag‘,‘|awk -F‘,‘ ‘{print $1}‘` do scp $src $server":"${dest} done elif [ -d $src ] then for server in `cat $confFile|grep -v ‘^#‘|grep ‘,‘$tag‘,‘|awk -F‘,‘ ‘{print $1}‘` do scp -r $src $server":"${dest} done else echo "Error: No source file exist" fi
else echo "Error: Please assign config file or run deploy.sh command with deploy.conf in same directory" fi 查看一下runRemoteCmd.sh远程执行命令脚本内容。 [hadoop@djt11 tools]$ cat runRemoteCmd.sh #!/bin/bash #set -x
if [ $# -lt 2 ] then echo "Usage: ./runRemoteCmd.sh Command MachineTag" echo "Usage: ./runRemoteCmd.sh Command MachineTag confFile" exit fi
cmd=$1 tag=$2 if [ ‘a‘$3‘a‘ == ‘aa‘ ] then
confFile=/home/hadoop/tools/deploy.conf else confFile=$3 fi
if [ -f $confFile ] then for server in `cat $confFile|grep -v ‘^#‘|grep ‘,‘$tag‘,‘|awk -F‘,‘ ‘{print $1}‘` do echo "*******************$server***************************" ssh $server "source /etc/profile; $cmd" done else echo "Error: Please assign config file or run deploy.sh command with deploy.conf in same directory" fi 以上三个文件,方便我们搭建hadoop分布式集群。具体如何使用看后面如何操作。 如果我们想直接使用脚本,还需要给脚本添加执行权限。 [hadoop@djt11 tools]$ chmod u+x deploy.sh [hadoop@djt11 tools]$ chmod u+x runRemoteCmd.sh 同时我们需要将/home/hadoop/tools目录配置到PATH路径中。 [hadoop@djt11 tools]$ su root Password: [root@djt11 tools]# vi /etc/profile PATH=/home/hadoop/tools:$PATH export PATH 然后source /etc/profile 我们在djt11节点上,通过runRemoteCmd.sh脚本,一键创建所有节点的软件安装目录/home/hadoop/app。 [hadoop@djt11 tools]$ runRemoteCmd.sh "mkdir /home/hadoop/app" all 我们可以在所有节点查看到/home/hadoop/app目录已经创建成功。
JDK的安装 将本地下载好的jdk1.7,上传至djt11节点下的/home/hadoop/app目录。 [root@djt11 tools]# su hadoop [hadoop@djt11 tools]$ cd /home/hadoop/app/ [hadoop@djt11 app]$ rz //选择本地的下载好的jdk-7u79-linux-x64.tar.gz [hadoop@djt11 app]$ ls jdk-7u79-linux-x64.tar.gz [hadoop@djt11 app]$ tar zxvf jdk-7u79-linux-x64.tar.gz //解压 [hadoop@djt11 app]$ ls jdk1.7.0_79 jdk-7u79-linux-x64.tar.gz [hadoop@djt11 app]$ rm -rf jdk-7u79-linux-x64.tar.gz //删除安装包 添加jdk环境变量。 [hadoop@djt11 app]$ su root Password: [root@djt11 app]# vi /etc/profile JAVA_HOME=/home/hadoop/app/jdk1.7.0_79 CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar PATH=$JAVA_HOME/bin:$PATH export JAVA_HOME CLASSPATH PATH [root@djt11 app]# source /etc/profile //使配置文件生效 查看jdk是否安装成功。 [root@djt11 app]# java -version java version "1.7.0_79" Java(TM) SE Runtime Environment (build 1.7.0_79-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode) 出现以上结果就说明djt11节点上的jdk安装成功。 然后将djt11下的jdk安装包复制到其他节点上。 [hadoop@djt11 app]$ deploy.sh jdk1.7.0_79 /home/hadoop/app/ slave djt12,djt13,djt14,djt15节点重复djt11节点上的jdk配置即可。 Zookeeper的安装 将本地下载好的zookeeper-3.4.6.tar.gz安装包,上传至djt11节点下的/home/hadoop/app目录下。 [hadoop@djt11 app]$ rz //选择本地下载好的zookeeper-3.4.6.tar.gz [hadoop@djt11 app]$ ls jdk1.7.0_79 zookeeper-3.4.6.tar.gz [hadoop@djt11 app]$ tar zxvf zookeeper-3.4.6.tar.gz //解压 [hadoop@djt11 app]$ ls jdk1.7.0_79 zookeeper-3.4.6.tar.gz zookeeper-3.4.6 [hadoop@djt11 app]$ rm zookeeper-3.4.6.tar.gz //删除zookeeper-3.4.6.tar.gz安装包 [hadoop@djt11 app]$ mv zookeeper-3.4.6 zookeeper //重命名
修改Zookeeper中的配置文件。 [hadoop@djt11 app]$ cd /home/hadoop/app/zookeeper/conf/ [hadoop@djt11 conf]$ ls configuration.xsl log4j.properties zoo_sample.cfg [hadoop@djt11 conf]$ cp zoo_sample.cfg zoo.cfg //复制一个zoo.cfg文件 [hadoop@djt11 conf]$ vi zoo.cfg dataDir=/home/hadoop/data/zookeeper/zkdata //数据文件目录 dataLogDir=/home/hadoop/data/zookeeper/zkdatalog //日志目录 # the port at which the clients will connect clientPort=2181 //默认端口号 #server.服务编号=主机名称:Zookeeper不同节点之间同步和通信的端口:选举端口(选举leader) server.2=djt12:2888:3888 server.3=djt13:2888:3888 server.4=djt14:2888:3888.observer server.5=djt15:2888:3888.observer 通过远程命令deploy.sh将Zookeeper安装目录拷贝到其他节点上面。 [hadoop@djt11 app]$ deploy.sh zookeeper /home/hadoop/app slave 通过远程命令runRemoteCmd.sh在所有的节点上面创建目录: [hadoop@djt11 app]$ runRemoteCmd.sh "mkdir -p /home/hadoop/data/zookeeper/zkdata" all //创建数据目录 [hadoop@djt11 app]$ runRemoteCmd.sh "mkdir -p /home/hadoop/data/zookeeper/zkdatalog" all //创建日志目录 然后分别在djt11、djt12、djt13、djt14、djt15上面,进入zkdata目录下,创建文件myid,里面的内容分别填充为:1、2、3、4、5, 这里我们以djt11为例。 [hadoop@djt11 app]$ cd /home/hadoop/data/zookeeper/zkdata [hadoop@djt11 zkdata]$ vi myid 1 //输入数字1
配置Zookeeper环境变量。 [hadoop@djt11 zkdata]$ su root Password: [root@djt11 zkdata]# vi /etc/profile JAVA_HOME=/home/hadoop/app/jdk1.7.0_79 ZOOKEEPER_HOME=/home/hadoop/app/zookeeper CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar PATH=$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$PATH export JAVA_HOME CLASSPATH PATH ZOOKEEPER_HOME [root@djt11 zkdata]# source /etc/profile //使配置文件生效 在djt11节点上面启动Zookeeper。 [hadoop@djt11 zkdata]$ cd /home/hadoop/app/zookeeper/ [hadoop@djt11 zookeeper]$ bin/zkServer.sh start [hadoop@djt11 zookeeper]$ jps 3633 QuorumPeerMain [hadoop@djt11 zookeeper]$ bin/zkServer.sh stop //关闭Zookeeper 使用runRemoteCmd.sh 脚本,启动所有节点上面的Zookeeper。 runRemoteCmd.sh "/home/hadoop/app/zookeeper/bin/zkServer.sh start" zookeeper 查看所有节点上面的QuorumPeerMain进程是否启动。 runRemoteCmd.sh "jps" zookeeper 查看所有Zookeeper节点状态。 runRemoteCmd.sh "/home/hadoop/app/zookeeper/bin/zkServer.sh status" zookeeper 如果一个节点为leader,另四个节点为follower,则说明Zookeeper安装成功。 将下载好的apache hadoop-2.6.0.tar.gz安装包,上传至djt11节点下的/home/hadoop/app目录下。 [hadoop@djt11 app]$ rz //将本地的hadoop-2.6.0.tar.gz安装包上传至当前目录 [hadoop@djt11 app]$ ls hadoop-2.6.0.tar.gz jdk1.7.0_79 zookeeper [hadoop@djt11 app]$ tar zxvf hadoop-2.6.0.tar.gz //解压 [hadoop@djt11 app]$ ls hadoop-2.6.0 hadoop-2.6.0.tar.gz jdk1.7.0_79 zookeeper [hadoop@djt11 app]$ rm hadoop-2.6.0.tar.gz //删除安装包 [hadoop@djt11 app]$ mv hadoop-2.6.0 hadoop //重命名
切换到/home/hadoop/app/hadoop/etc/hadoop/目录下,修改配置文件。 [hadoop@djt11 app]$ cd /home/hadoop/app/hadoop/etc/hadoop/ 配置HDFS 配置hadoop-env.sh [hadoop@djt11 hadoop]$ vi hadoop-env.sh export JAVA_HOME=/home/hadoop/app/jdk1.7.0_79
配置core-site.xml [hadoop@djt11 hadoop]$ vi core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://cluster1</value> </property> <!-- 这里的值指的是默认的HDFS路径 ,取名为cluster1--> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/data/tmp</value> </property> <!-- hadoop的临时目录,如果需要配置多个目录,需要逗号隔开,data目录需要我们自己创建--> <property> <name>ha.zookeeper.quorum</name> <value>djt11:2181,djt12:2181,djt13:2181,djt14:2181,djt15:2181</value> </property> <!-- 配置Zookeeper 管理HDFS--> </configuration>
配置hdfs-site.xml [hadoop@djt11 hadoop]$ vi hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>3</value> </property> <!-- 数据块副本数为3--> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> <!-- 权限默认配置为false--> <property> <name>dfs.nameservices</name> <value>cluster1</value> </property> <!-- 命名空间,它的值与fs.defaultFS的值要对应,namenode高可用之后有两个namenode,cluster1是对外提供的统一入口--> <property> <name>dfs.ha.namenodes.cluster1</name> <value>djt11,djt12</value> </property> <!-- 指定 nameService 是 cluster1 时的nameNode有哪些,这里的值也是逻辑名称,名字随便起,相互不重复即可--> <property> <name>dfs.namenode.rpc-address.cluster1.djt11</name> <value>djt11:9000</value> </property> <!-- djt11 rpc地址--> <property> <name>dfs.namenode.http-address.cluster1.djt11</name> <value>djt11:50070</value> </property> <!-- djt11 http地址--> <property> <name>dfs.namenode.rpc-address.cluster1.djt12</name> <value>djt12:9000</value> </property> <!-- djt12 rpc地址--> <property> <name>dfs.namenode.http-address.cluster1.djt12</name> <value>djt12:50070</value> </property> <!-- djt12 http地址--> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- 启动故障自动恢复--> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://djt11:8485;djt12:8485;djt13:8485;djt14:8485;djt15:8485/cluster1</value> </property> <!-- 指定journal--> <property> <name>dfs.client.failover.proxy.provider.cluster1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 指定 cluster1 出故障时,哪个实现类负责执行故障切换--> <property> <name>dfs.journalnode.edits.dir</name> <value>/home/hadoop/data/journaldata/jn</value> </property> <!-- 指定JournalNode集群在对nameNode的目录进行共享时,自己存储数据的磁盘路径 --> <property> <name>dfs.ha.fencing.methods</name> <value>shell(/bin/true)</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>10000</value> </property> <!-- 脑裂默认配置--> <property> <name>dfs.namenode.handler.count</name> <value>100</value> </property> </configuration> 配置 slave [hadoop@djt11 hadoop]$ vi slaves djt13 djt14 djt15 向所有节点分发hadoop安装包。 [hadoop@djt11 app]$ deploy.sh hadoop /home/hadoop/app/ slave JAVA_HOME=/home/hadoop/app/jdk1.7.0_79 ZOOKEEPER_HOME=/home/hadoop/app/zookeeper HADOOP_HOME=/home/hadoop/app/hadoop CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar PATH=$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:/home/hadoop/tools:$HADOOP_HOME/bin:$PATH export JAVA_HOME CLASSPATH PATH ZOOKEEPER_HOME HADOOP_HOME
ZOOKEEPER_HOME=/home/hadoop/app/zookeeper HADOOP_HOME=/home/hadoop/app/hadoop CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar PATH=$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$PATH export JAVA_HOME CLASSPATH PATH ZOOKEEPER_HOME HADOOP_HOME hdfs配置完毕后启动顺序 1、启动所有Zookeeper [hadoop@djt11 hadoop]$ runRemoteCmd.sh "/home/hadoop/app/zookeeper/bin/zkServer.sh start" zookeeper 2、每个节点分别启动journalnode [hadoop@djt11 hadoop]$ runRemoteCmd.sh "/home/hadoop/app/hadoop/sbin/hadoop-daemon.sh start journalnode" all 3、主节点执行格式化 [hadoop@djt11 hadoop]$ bin/hdfs namenode -format / /namenode 格式化 [hadoop@djt11 hadoop]$ bin/hdfs zkfc -formatZK //格式化高可用 [hadoop@djt11 hadoop]$bin/hdfs namenode //启动namenode
4、备节点执行 [hadoop@djt12 hadoop]$ bin/hdfs namenode -bootstrapStandby //同步主节点和备节点之间的元数据
5、停掉hadoop,在djt11按下ctrl+c结束namenode [hadoop@djt11 hadoop]$ runRemoteCmd.sh "/home/hadoop/app/hadoop/sbin/hadoop-daemon.sh stop journalnode" all //然后停掉各节点的journalnode 6、一键启动hdfs相关进程 [hadoop@djt11 hadoop]$ sbin/start-dfs.sh 启动成功之后,关闭其中一个namenode ,然后在启动namenode 观察切换的状况。 7、验证是否启动成功 通过web界面查看namenode启动情况。
上传文件至hdfs [hadoop@djt11 hadoop]$ vi djt.txt //本地创建一个djt.txt文件 hadoop dajiangtai hadoop dajiangtai hadoop dajiangtai [hadoop@djt11 hadoop]$ hdfs dfs -mkdir /test //在hdfs上创建一个文件目录 [hadoop@djt11 hadoop]$ hdfs dfs -put djt.txt /test //向hdfs上传一个文件 [hadoop@djt11 hadoop]$ hdfs dfs -ls /test //查看djt.txt是否上传成功
如果上面操作没有问题说明hdfs配置成功。 配置mapred-site.xml [hadoop@djt11 hadoop]$ vi mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!--指定运行mapreduce的环境是Yarn,与hadoop1不同的地方--> </configuration>
配置yarn-site.xml [hadoop@djt11 hadoop]$ vi yarn-site.xml <configuration> <property> <name>yarn.resourcemanager.connect.retry-interval.ms</name> <value>2000</value> </property> <!-- 超时的周期--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- 打开高可用--> <property> <name>yarn.resourcemanager.ha.automatic-failover.enabled</name> <value>true</value> </property> <!--启动故障自动恢复--> <property> <name>yarn.resourcemanager.ha.automatic-failover.embedded</name> <value>true</value> </property> <!--failover使用内部的选举算法--> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yarn-rm-cluster</value> </property> <!--给yarn cluster 取个名字yarn-rm-cluster--> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!--给ResourceManager 取个名字 rm1,rm2--> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>djt11</value> </property> <!--配置ResourceManager rm1 hostname--> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>djt12</value> </property> <!--配置ResourceManager rm2 hostname--> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!--启用resourcemanager 自动恢复--> <property> <name>yarn.resourcemanager.zk.state-store.address</name> <value>djt11:2181,djt12:2181,djt13:2181,djt14:2181,djt15:2181</value> </property> <!--配置Zookeeper地址--> <property> <name>yarn.resourcemanager.zk-address</name> <value>djt11:2181,djt12:2181,djt13:2181,djt14:2181,djt15:2181</value> </property> <!--配置Zookeeper地址--> <property> <name>yarn.resourcemanager.address.rm1</name> <value>djt11:8032</value> </property> <!-- rm1端口号--> <property> <name>yarn.resourcemanager.scheduler.address.rm1</name> <value>djt11:8034</value> </property> <!-- rm1调度器的端口号--> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>djt11:8088</value> </property> <!-- rm1 webapp端口号--> <property> <name>yarn.resourcemanager.address.rm2</name> <value>djt12:8032</value> </property> <!-- rm2端口号--> <property> <name>yarn.resourcemanager.scheduler.address.rm2</name> <value>djt12:8034</value> </property> <!-- rm2调度器的端口号--> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>djt12:8088</value> </property> <!-- rm2 webapp端口号--> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <!--执行MapReduce需要配置的shuffle过程--> </configuration> 以上的yarn配置,在所有节点中也做相同的配置 启动YARN 1、在djt11节点上执行。 [hadoop@djt11 hadoop]$ sbin/start-yarn.sh
2、在djt12节点上面执行。 [hadoop@djt12 hadoop]$ sbin/yarn-daemon.sh start resourcemanager
同时打开一下web界面。
关闭其中一个resourcemanager,然后再启动,看看这个过程的web界面变化。 3、检查一下ResourceManager状态 [hadoop@djt11 hadoop]$ bin/yarn rmadmin -getServiceState rm1 [hadoop@djt11 hadoop]$ bin/yarn rmadmin -getServiceState rm2
4、Wordcount示例测试 [hadoop@djt11 hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /test/djt.txt /test/out/
如果上面执行没有异常,说明YARN安装成功。 hadoop 分布式集群安装,相关的jar包、脚本、配置文件可点击下载。 至此,hadoop 分布式集群搭建完毕。 安装成功后启动关闭的顺序 关闭
[hadoop@djt11 hadoop]$tar -zxvf apache-flume-1.6.0-bin.tar.gz关闭YARN
1、在djt11节点上执行 [hadoop@djt11 hadoop]$ sbin/stop-yarn.sh 2、在djt12节点上面执行 [hadoop@djt12 hadoop]$ sbin/yarn-daemon.sh stop resourcemanager 3、关闭HDFS [hadoop@djt11 hadoop]$ sbin/stop-dfs.sh 4、关闭zookeeper [hadoop@djt11 hadoop]$ runRemoteCmd.sh "/home/hadoop/app/zookeeper/bin/zkServer.sh stop" zookeeper
启动 1、启动zookeeper [hadoop@djt11 hadoop]$ runRemoteCmd.sh "/home/hadoop/app/zookeeper/bin/zkServer.sh start" zookeeper 2、启动HDFS [hadoop@djt11 hadoop]$ sbin/start-dfs.sh 3、在djt11节点上执行 [hadoop@djt11 hadoop]$ sbin/start-yarn.sh 4、在djt12节点上面执行 [hadoop@djt12 hadoop]$ sbin/yarn-daemon.sh start resourcemanager 获取安装包 添加hive环境变量 在/etc/profile , /home/hadoop/.bashrc , /home/hadoop/hive/conf/hive-env.sh中添加以下内容 JAVA_HOME=/home/hadoop/app/jdk1.7.0_79 ZOOKEEPER_HOME=/home/hadoop/app/zookeeper HADOOP_HOME=/home/hadoop/app/hadoop HIVE_HOME=/home/hadoop/app/hive CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$HIVE_HOME/lib PATH=$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:/home/hadoop/tools:$HADOOP_HOME/bin:$HIVE_HOME/bin:$PATH export JAVA_HOME CLASSPATH PATH ZOOKEEPER_HOME HADOOP_HOME HIVE_HOME 在hadoop用户下vi app/hive/bin/hive-config.sh
添加以上内容
启动hive
(启动的条件:hadoop集群启动,jdk环境配置好)
[hadoop@djt11 ~]$ hive hive> show databases; OK default Time taken: 0.384 seconds, Fetched: 1 row(s) hive> exit; [root@djt11 ~]# vi /etc/profile
HIVE_HOME=/home/hadoop/app/hive 安装mysql数据库
[root@djt11 ~]# yum -y install mysql-server
[root@djt11 ~]# service mysqld start
设置mysql的root密码
[root@djt11 ~]# mysql -u root -p
(密码为空)
mysql> select user, host, password from mysql.user
-> ;
+------+-----------+----------+
| user | host | password |
+------+-----------+----------+
| root | localhost | |
| root | djt11 | |
| root | 127.0.0.1 | |
| | localhost | |
| | djt11 | |
+------+-----------+----------+
5 rows in set (0.00 sec)
mysql> set password for root@localhost=password(‘root‘);
Query OK, 0 rows affected (0.00 sec)
mysql> set password for root@djt11=password(‘root‘);
Query OK, 0 rows affected (0.00 sec)
mysql> set password for root@127.0.0.1=password(‘root‘);
Query OK, 0 rows affected (0.00 sec)
mysql> select user, host, password from mysql.user;
mysql> grant all on *.* to ‘root‘@‘%‘ identified by ‘root‘;
Query OK, 0 rows affected (0.00 sec)
mysql> grant all on *.* to ‘root‘@‘localhost‘ identified by ‘root‘;
Query OK, 0 rows affected (0.00 sec)
mysql> grant all on *.* to ‘root‘@‘djt11‘ identified by ‘root‘;
Query OK, 0 rows affected (0.00 sec)
mysql> grant all on *.* to ‘root‘@‘127.0.0.1‘ identified by ‘root‘;
Query OK, 0 rows affected (0.00 sec)
mysql> set password for ‘root‘@‘%‘=password(‘root‘);
Query OK, 0 rows affected (0.00 sec)
创建hive用户
mysql> grant all on *.* to ‘hive‘@‘localhost‘ identified by ‘hive‘;
Query OK, 0 rows affected (0.00 sec)
mysql> grant all on *.* to ‘hive‘@‘djt11‘ identified by ‘hive‘;
Query OK, 0 rows affected (0.00 sec)
mysql> grant all on *.* to ‘hive‘@‘127.0.0.1‘ identified by ‘hive‘;
Query OK, 0 rows affected (0.00 sec)
mysql> grant all on *.* to ‘hive‘@‘%‘ identified by ‘hive‘;
Query OK, 0 rows affected (0.00 sec)
mysql> set password for hive@localhost=password(‘hive‘);
Query OK, 0 rows affected (0.00 sec)
mysql> set password for hive@djt11=password(‘hive‘);
Query OK, 0 rows affected (0.00 sec)
mysql> set password for hive@127.0.0.1=password(‘hive‘);
Query OK, 0 rows affected (0.00 sec)
mysql> set password for hive@‘%‘=password(‘hive‘);
Query OK, 0 rows affected (0.01 sec)
创建hive数据库
[root@djt11 ~]# mysql -u hive -phive
mysql> create databases;
[hadoop@djt11 ~]$ cd app/hive/conf/
[hadoop@djt11 conf]$ vi hive-site.xml
添加
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://djt11:3306/hive?characterEncoding=UTF-8</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastore database</description>
</property>
</configuration>
hadoop用户下上传mysql connection jar包到hive目录的lib文件下,和hadoop目录下的lib文件
[hadoop@djt11 ~]$ hive
成功安装hive
启动hive的步骤: 启动好 1、启动zookeeper [hadoop@djt11 hadoop]$ runRemoteCmd.sh "/home/hadoop/app/zookeeper/bin/zkServer.sh start" zookeeper 2、启动HDFS [hadoop@djt11 hadoop]$ sbin/start-dfs.sh 3、在djt11节点上执行 [hadoop@djt11 hadoop]$ sbin/start-yarn.sh 4、在djt12节点上面执行 5、启动mysql服务 [root@djt11 ~]# service mysqld start web界面启动hive
下载hive的src包,并解压后
[hadoop@djt11 ~]$ cd apache-hive-1.2.1-src/hwi/
[hadoop@djt11 hwi]$ jar cfM hive-war-1.0.0.war -C web .
[hadoop@djt11 hwi]$ ll
total 64
-rw-rw-r--. 1 hadoop hadoop 47966 Apr 12 18:43 hive-war-1.0.0.war
复制到hive目录下的lib包中
[hadoop@djt11 hwi]$ cp hive-war-1.0.0.war /home/hadoop/app/hive/lib/
在hive-site.xml中添加 <property> <name>hive.hwi.war.file</name> <value>lib/hive-war-1.0.0.war</value> </property> <property> <name>hive.hwi.result</name> <value>/home/hadoop/hive/warehouse_hive</value> </property> 由于hive的lib文件夹中缺少tools.jar
需要从jdk的lib下拷贝tools.jar
才启动命令:
[hadoop@djt11 ~]$ hive --service hwi
访问端口号9999
Hbase的安装
下载Hbase:
[hadoop@djt11 app]$wget apache.opencas.org/hbase/0.98.18/hbase-0.98.18-hadoop2-bin.tar.gz [hadoop@djt11 app]$ tar -zxvf hbase-0.98.18-hadoop2-bin.tar.gz [hadoop@djt11 app]$mv hbase-0.98.18-hadoop2-bin hbase [hadoop@djt11 app]$ cd hbase/conf/ [hadoop@djt11 conf]$ vi regionservers 添加:
djt13 [hadoop@djt11 conf]$ vi hbase-env.sh [hadoop@djt11 app]$ mv apache-flume-1.6.0-bin flume [hadoop@djt11 conf]$ cp flume-conf.properties.template flume-conf.properties [hadoop@djt11 conf]$ cp flume-env.sh.template flume-env.sh [hadoop@djt11 conf]$vi flume-env.sh(主要是更改JAVA_HOME的配置) [root@djt11 ~]# vi /etc/profile FLUME_HOME=/home/hadoop/app/flume PATH=$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:/home/hadoop/tools:$HADOOP_HOME/bin:$HIVE_HOME/bin:$HBASE_HOME/bin:$FLUME_HOME/bin:$PATH export JAVA_HOME CLASSPATH PATH ZOOKEEPER_HOME HADOOP_HOME HIVE_HOME HBASE_HOME FLUME_HOME 验证安装是否成功: [hadoop@djt11 flume]$ bin/flume-ng version Flume 1.6.0 Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git Revision: 2561a23240a71ba20bf288c7c2cda88f443c2080 Compiled by hshreedharan on Mon May 11 11:15:44 PDT 2015 From source with checksum b29e416802ce9ece3269d34233baf43f Sqoop 安装 下载sqoop [hadoop@djt11 ~]$ wget http://apache.fayea.com/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz 解压好后 [hadoop@djt11 ~]$ cd app/sqoop/conf [hadoop@djt11 conf]$ cp sqoop-env-template.sh sqoop-env.sh [hadoop@djt11 conf]$ vi sqoop-env.sh #Set path to where bin/hadoop is available export HADOOP_COMMON_HOME=/home/hadoop/app/hadoop #Set path to where hadoop-*-core.jar is available export HADOOP_MAPRED_HOME=/home/hadoop/app/hadoop/share/hadoop/common #set the path to where bin/hbase is available export HBASE_HOME=/home/hadoop/app/hbase #Set the path to where bin/hive is available export HIVE_HOME=/home/hadoop/app/hive #Set the path for where zookeeper config dir is export ZOOCFGDIR=/home/hadoop/app/zookeeper/conf 配置环境变量: [root@djt11 ~]# vi /etc/profile SQOOP_HOME=/home/hadoop/app/sqoop PATH=$PATH SQOOP_HOME/bin export SQOOP_HOME 将mysql连接jar包 和hadoop的share中的 commons-cli-1.2.jar、 hadoop-common-2.2.0.jar hadoop-mapreduce-client-core-2.2.0.jar 拷贝到sqoop的lib下 验证是否安装成功: [hadoop@djt11 ~]$ 16/04/14 16:15:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 Sqoop 1.4.6 git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25 Compiled by root on Mon Apr 27 14:38:36 CST 2015 |
原文:http://www.cnblogs.com/zhj983452257/p/5399414.html