首页 > Web开发 > 详细

How To Setup Apache Hadoop On CentOS

时间:2016-01-09 20:13:24      阅读:271      评论:0      收藏:0      [点我收藏+]

he Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The project includes these modules:

  • Hadoop Common: The common utilities that support the other Hadoop modules.
  • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
  • Hadoop YARN: A framework for job scheduling and cluster resource management.
  • Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

This article will help you for step by step install and configure single node hadoop cluster using Hadoop on centos.

Install Java

Before installing hadoop make sure you have java installed on your system. Use this command to check the version of the installed Java.

java -version
java version "1.7.0_75"
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)

To install or update Java use following step by step instructions.

First step is to download latest version of java from the Oracle official website.

cd /opt/
wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-x64.tar.gz"
tar xzf jdk-7u79-linux-x64.tar.gz

Need to set up to use newer version of Java using alternatives. Use the following commands to do it.

cd /opt/jdk1.7.0_79/
alternatives --install /usr/bin/java java /opt/jdk1.7.0_79/bin/java 2
alternatives --config java
There are 3 programs which provide ‘java‘.
Selection    Command
-----------------------------------------------
*  1           /opt/jdk1.7.0_60/bin/java
+ 2           /opt/jdk1.7.0_72/bin/java
3           /opt/jdk1.7.0_79/bin/java
Enter to keep the current selection[+], or type selection number: 3 [Press Enter]

Now you may also required to set up javac and jar commands path using alternatives command.

alternatives --install /usr/bin/jar jar /opt/jdk1.7.0_79/bin/jar 2
alternatives --install /usr/bin/javac javac /opt/jdk1.7.0_79/bin/javac 2
alternatives --set jar /opt/jdk1.7.0_79/bin/jar
alternatives --set javac /opt/jdk1.7.0_79/bin/javac

The next step is to configure environment variables. Use following commands to set up these variable properly

  • Setup JAVA_HOME Variable
export JAVA_HOME=/opt/jdk1.7.0_79
  • Setup JRE_HOME Variable
export JRE_HOME=/opt/jdk1.7.0_79/jre
  • Setup PATH Variable
export PATH=$PATH:/opt/jdk1.7.0_79/bin:/opt/jdk1.7.0_79/jre/bin

Installing Apache Hadoop

After setting up the java environment. Let stat installing Apache Hadoop.

The first step is to create a system user account to use for hadoop installation.

useradd hadoop
passwd hadoop

Now you need to configure the ssh keys for the user hadoop. Using following command to enable ssh login without password.

su - hadoop
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
exit

Now download hadoop latest available version from its official site hadoop.apache.org.

cd ~
wget http://apache.claz.org/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
tar xzf hadoop-2.6.0.tar.gz
mv hadoop-2.6.0 hadoop

Now the next step is to set environment variable uses by hadoop.

Edit ~/.bashrc file and add the following listes of  values at end of file.

 
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Then apply the changes in current running environment

source ~/.bashrc

edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file and set JAVA_HOME environment variable

export JAVA_HOME=/opt/jdk1.7.0_79/

Now you start with the configuration with basic hadoop single node cluster setup.

First edit hadoop configuration files and make following changes.

 cd /home/hadoop/hadoop/etc/hadoop

Let’s start by editing core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Then Edit hdfs-site.xml:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>

and edit mapred-site.xml:

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

finally edit yarn-site.xml:

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

Now format the namenode using following command:

hdfs namenode -format

To start all hadoop services use the following command:

cd /home/hadoop/hadoop/sbin/
start-dfs.sh
start-yarn.sh

To check if all services are started well use ‘jps‘ command:

jps

You should see like this output.

26049 SecondaryNameNode
25929 DataNode
26399 Jps
26129 JobTracker
26249 TaskTracker
25807 NameNode

Now you can access to Hadoop Services in your Browser at: http://your-ip-address:8088/.

技术分享

Thanks!!!

Referred : http://www.unixmen.com/setup-apache-hadoop-centos

 

 

 

 

----------------------------

hadoop安装完以后,经常会提示一下警告:

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... 
using builtin-java classes where applicable

搜了好多文章,都说是跟系统位数有关系,我使用的是Centos 6.5 64位操作系统。
前两天在做Docker镜像的时候发现了一个步骤可以解决这个问题,亲试了一下,果然不再提示了。
首先下载hadoop-native-64-2.4.0.tar: http://dl.bintray.com/sequenceiq/sequenceiq-bin/hadoop-native-64-2.4.0.tar 如果你是hadoop2.6的可以下载下面这个: http://dl.bintray.com/sequenceiq/sequenceiq-bin/hadoop-native-64-2.6.0.tar

下载完以后,解压到hadoop的native目录下,覆盖原有文件即可。操作如下:
tar -x hadoop-native-64-2.4.0.tar -C  hadoop/lib/native/

How To Setup Apache Hadoop On CentOS

原文:http://www.cnblogs.com/haoliansheng/p/5116996.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!