环境:centos6.7,hadoop2.7.3,虚拟机VMware
下载hadoop:http://apache.fayea.com/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
namendoe 192.168.137.9 ; secondnode 192.168.137.15 ; datanode 192.168.137.16
修改三台主机的/etc/hosts,将namenode,secondnode,datanode信息分别加入
[root@namenode ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.137.9 namenode 192.168.137.15 secondnode 192.168.137.16 datanode
4.官网下载jdk:jdk-8u77-linux-x64.tar.gz
5.安装java
①yum remove java -y
②tar zxvf jdk-8u77-linux-x64.tar.gz
③mv jdk1.8.0_77 /usr/local/java
④vi /etc/profile
export JAVA_HOME=/usr/local/java exportCLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$PATH:$JAVA_HOME/bin
⑤source /etc/profile
[root@namenode src]# java -version java version "1.8.0_77" Java(TM) SE Runtime Environment (build 1.8.0_77-b03) Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)
三台主机做以上命令操作。
6.环境变量优化:
cat << EOF > ~/.toprc RCfile for "top withwindows" # shameless braggin‘ Id:a, Mode_altscr=0, Mode_irixps=1,Delay_time=3.000, Curwin=0 Def fieldscur=AEHIOQTWKNMBcdfgjplrSuvyzX winflags=32569, sortindx=10, maxtasks=0 summclr=1, msgsclr=1, headclr=3,taskclr=2 Job fieldscur=ABcefgjlrstuvyzMKNHIWOPQDX winflags=62777, sortindx=0, maxtasks=0 summclr=6, msgsclr=6, headclr=7, taskclr=6 Mem fieldscur=ANOPQRSTUVbcdefgjlmyzWHIKX winflags=62777, sortindx=13, maxtasks=0 summclr=5, msgsclr=5, headclr=4, taskclr=5 Usr fieldscur=ABDECGfhijlopqrstuvyzMKNWX winflags=62777, sortindx=4, maxtasks=0 summclr=3, msgsclr=3, headclr=2, taskclr=3 EOF
继续环境变量优化:
vim /etc/security/limits.conf
hadoop - nofile 32768 hadoop - nproc 32000
继续环境变量优化:
vim /etc/pam.d/system-auth
auth required pam_limits.so
所有节点操作。
7.创建hadoop用户
useradd -u 5000 hadoop && echo"hadoop"|passwd --stdin hadoop
mkdir /data &&chown -R hadoop.hadoop /data
所有节点操作。
8.免密登录
①su - hadoop
②ssh-keygen
③在namenode上:
vi .ssh/authorized_keys
将所有节点的.ssh/id_rsa.pub 内容加入,然后分发给各个节点。
chmod 600 .ssh/authorized_keys
9.namenode操作:
解压hadoop,
tar zxvf hadoop-2.7.3.tar.gz
移动目录:
mv hadoop-2.7.3 /home/hadoop/hadoop2.7.3
10.每个节点操作:
vim /home/hadoop/.bash_profile
修改:
修改: export HADOOP_HOME=/home/hadoop/hadoop2.7.3 export PATH=$PATH:$HADOOP_HOME:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export HADOOP_HOME_WARN_SUPPRESS=1 export PATH
$source /home/hadoop/.bash_profile
11.namenode上操作:
$cd /home/hadoop/hadoop2.7.3/etc/hadoop
$vim hadoop-env.sh
修改:
export JAVA_HOME=/usr/local/java
增加:
export HADOOP_PREFIX=/home/hadoop/hadoop2.7.3 export HADOOP_HEAPSIZE=15000
$vim yarn-env.sh
修改:
export JAVA_HOME=/usr/local/java
$vim mapred-env.sh
修改:
export JAVA_HOME=/usr/local/java
$ vi hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.namenode.http-address</name> <value>namenode:50070</value> <description> NameNode 通过当前参数 获得 fsimage 和 edits </description> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>secondnode:50090</value> <description> SecondNameNode 通过当前参数 获得最新的 fsimage </description> </property> <property> <name>dfs.replication</name> <value>2</value> <description> 设定 HDFS 存储文件的副本个数,默认为3 </description> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>file:///home/hadoop/hadoop2.7.3/hdfs/namesecondary</value> <description> 设置 secondary 存放 临时镜像 的本地文件系统路径,如果这是一个用逗号分隔的文件列表,则镜像将会冗余复制到所有目录,只对 secondary 有效 </description> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///data/work/hdfs/name/</value> <description> namenode 用来持续存放命名空间和交换日志的本地文件系统路径 </description> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///data/work/hdfs</value> <description> DataNode 在本地存放块文件的目录列表,用逗号分隔 </description> </property> <property> <name>dfs.stream-buffer-size</name> <value>131072</value> <description> 默认是4KB,作为hadoop缓冲区,用于hadoop读hdfs的文件和写 hdfs的文件,还有map的输出都用到了这个缓冲区容量,对于现在的硬件很保守,可以设置为128k(131072),甚至是1M(太大了map和reduce任务可能会内存溢出) </description> </property> <property> <name>dfs.namenode.checkpoint.period</name> <value>3600</value> <description> 两次 checkpoints 之间的间隔,单位为秒,只对 secondary 有效 </description> </property> </configuration>
具体可以查看官网资料:http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
$vim mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
$vim yarn-site.xml
修改:
<?xml version="1.0"?> <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
$ vi core-site.xml
修改:
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://namenode:9000/</value> <description> 设定 namenode 的 主机名 及 端口 </description> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> <description> 存放临时文件的目录 </description> </property> </configuration>
具体可参考:http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/core-default.xml
12.所有节点上新建目录
$mkdir /home/hadoop/tmp $mkdir /data/work/hdfs/namesecondary -p
13.namenode上
$start-all.sh
本文出自 “echo xiayun” 博客,请务必保留此出处http://linuxerxy.blog.51cto.com/10707334/1877842
原文:http://linuxerxy.blog.51cto.com/10707334/1877842