he Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
The project includes these modules:
This article will help you for step by step install and configure single node hadoop cluster using Hadoop on centos.
Before installing hadoop make sure you have java installed on your system. Use this command to check the version of the installed Java.
java -version java version "1.7.0_75" Java(TM) SE Runtime Environment (build 1.7.0_75-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
To install or update Java use following step by step instructions.
First step is to download latest version of java from the Oracle official website.
cd /opt/ wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-x64.tar.gz" tar xzf jdk-7u79-linux-x64.tar.gz
Need to set up to use newer version of Java using alternatives. Use the following commands to do it.
cd /opt/jdk1.7.0_79/ alternatives --install /usr/bin/java java /opt/jdk1.7.0_79/bin/java 2 alternatives --config java
There are 3 programs which provide ‘java‘. Selection Command ----------------------------------------------- * 1 /opt/jdk1.7.0_60/bin/java + 2 /opt/jdk1.7.0_72/bin/java 3 /opt/jdk1.7.0_79/bin/java Enter to keep the current selection[+], or type selection number: 3 [Press Enter]
Now you may also required to set up javac and jar commands path using alternatives command.
alternatives --install /usr/bin/jar jar /opt/jdk1.7.0_79/bin/jar 2 alternatives --install /usr/bin/javac javac /opt/jdk1.7.0_79/bin/javac 2 alternatives --set jar /opt/jdk1.7.0_79/bin/jar alternatives --set javac /opt/jdk1.7.0_79/bin/javac
The next step is to configure environment variables. Use following commands to set up these variable properly
export JAVA_HOME=/opt/jdk1.7.0_79
export JRE_HOME=/opt/jdk1.7.0_79/jre
export PATH=$PATH:/opt/jdk1.7.0_79/bin:/opt/jdk1.7.0_79/jre/bin
After setting up the java environment. Let stat installing Apache Hadoop.
The first step is to create a system user account to use for hadoop installation.
useradd hadoop passwd hadoop
Now you need to configure the ssh keys for the user hadoop. Using following command to enable ssh login without password.
su - hadoop ssh-keygen -t rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 0600 ~/.ssh/authorized_keys exit
Now download hadoop latest available version from its official site hadoop.apache.org.
cd ~ wget http://apache.claz.org/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz tar xzf hadoop-2.6.0.tar.gz mv hadoop-2.6.0 hadoop
Now the next step is to set environment variable uses by hadoop.
Edit ~/.bashrc file and add the following listes of values at end of file.
export HADOOP_HOME=/home/hadoop/hadoop export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
Then apply the changes in current running environment
source ~/.bashrc
edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file and set JAVA_HOME environment variable
export JAVA_HOME=/opt/jdk1.7.0_79/
Now you start with the configuration with basic hadoop single node cluster setup.
First edit hadoop configuration files and make following changes.
cd /home/hadoop/hadoop/etc/hadoop
Let’s start by editing core-site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
Then Edit hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.name.dir</name> <value>file:///home/hadoop/hadoopdata/hdfs/namenode</value> </property> <property> <name>dfs.data.dir</name> <value>file:///home/hadoop/hadoopdata/hdfs/datanode</value> </property> </configuration>
and edit mapred-site.xml:
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
finally edit yarn-site.xml:
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
Now format the namenode using following command:
hdfs namenode -format
To start all hadoop services use the following command:
cd /home/hadoop/hadoop/sbin/ start-dfs.sh start-yarn.sh
To check if all services are started well use ‘jps‘ command:
jps
You should see like this output.
26049 SecondaryNameNode 25929 DataNode 26399 Jps 26129 JobTracker 26249 TaskTracker 25807 NameNode
Now you can access to Hadoop Services in your Browser at: http://your-ip-address:8088/.
Thanks!!!
Referred : http://www.unixmen.com/setup-apache-hadoop-centos
----------------------------
hadoop安装完以后,经常会提示一下警告:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
搜了好多文章,都说是跟系统位数有关系,我使用的是Centos 6.5 64位操作系统。
前两天在做Docker镜像的时候发现了一个步骤可以解决这个问题,亲试了一下,果然不再提示了。
首先下载hadoop-native-64-2.4.0.tar: http://dl.bintray.com/sequenceiq/sequenceiq-bin/hadoop-native-64-2.4.0.tar 如果你是hadoop2.6的可以下载下面这个: http://dl.bintray.com/sequenceiq/sequenceiq-bin/hadoop-native-64-2.6.0.tar
tar -x hadoop-native-64-2.4.0.tar -C hadoop/lib/native/
How To Setup Apache Hadoop On CentOS
原文:http://www.cnblogs.com/haoliansheng/p/5116996.html