使用hadoop版本:2.2.0
安装下载啥的就不嘀咕了,直接从配置开始:
hadoop需要配置的有以下几个文件,都在$HADOOP_HOME/etc/hadoop/:
hadoop-env.sh:里面有个JAVA_HOME的,配置到JDK的位置
core-site.xml:将以下代码插入到configuration中间
<property> <name>hadoop.tmp.dir</name> <value>/home/username/kit/hadoop/data/</value> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> <final>true</final> </property>
hdfs-site.xml:代码如下:
<property> <name>dfs.namenode.name.dir</name> <value>file:///home/shizhida/kit/hadoop/</value> <final>true</final> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///home/shizhida/kit/hadoop/</value> <final>true</final> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property>
mapred-site.xml:这个是复制一个mapred-site.xml.template,然后改名,然后写入如下代码:
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
yarn-site.xml:这个略多,有些可能不必要,从别处抄的,就全加上了
<property> <name>yarn.resourcemanager.hostname</name> <value>localhost</value> <description>hostanem of RM</description> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>localhost:5274</value> <description>host is the hostname of the resource manager and port is the port on which the NodeManagers contact the Resource Manager. </description> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>localhost:5273</value> <description>host is the hostname of the resourcemanager and port is the port on which the Applications in the cluster talk to the Resource Manager. </description> </property> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value> <description>In case you do not want to use the default scheduler</description> </property> <property> <name>yarn.resourcemanager.address</name> <value>localhost:5271</value> <description>the host is the hostname of the ResourceManager and the port is the port on which the clients can talk to the Resource Manager. </description> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value></value> <description>the local directories used by the nodemanager</description> </property> <property> <name>yarn.nodemanager.address</name> <value>localhost:5272</value> <description>the nodemanagers bind to this port</description> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>10240</value> <description>the amount of memory on the NodeManager in GB</description> </property> <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/app-logs</value> <description>directory on hdfs where the application logs are moved to </description> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value></value> <description>the directories used by Nodemanagers as log directories</description> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <description>shuffle service that needs to be set for Map Reduce to run </description> </property>
把这几个文件配置好后,基本就大功告成了。
如果系统是64位的,需要将$HADOOP_HOME/lib/native/的文件替换为64位版本的,这个可以自己下载源码编译,具体请百度搜索,网上也有大神编译好的文件可以拿来替换。
然后是ssh的安装,因为系统自带有openssh-client,安装一个openssh-server就可以了。
ssh有个免密码的设置,可以省去超多的麻烦,下文的设置只适用于单机:
$ ssh-keygen -t dsa -P ‘‘ -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
注意第一行中间那是两个单引号!
然后在/etc/profile文件中加入如下语句:
export HADOOP_HOME=/home/shizhida/kit/hadoop-2.2.0
export
PATH=$HADOOP_HOME/bin:$PATH
将hadoop的路径加入到环境变量,可以省去超多麻烦有木有
至此安装基本完成,请重启后输入:
$hadoop namenode -format
进行最初的格式化。然后该干啥干啥吧~
搭建基于ubuntu14.04麒麟的hadoop单机测试环境,布布扣,bubuko.com
搭建基于ubuntu14.04麒麟的hadoop单机测试环境
原文:http://www.cnblogs.com/Ayanami-Blob/p/3675561.html