WINDOWS 下 HADOOP 的安装

时间：2014-03-24 03:00:28 阅读：506 评论：0 收藏：0 [点我收藏+]

Windows hadoop installation

1.Install Cygwin

2.Install Cygwin components:openssh,openssl,sed,subversion

3.AddCygwin/binandCygwin/usr/sbin to windowspath

4.Install sshd

In Cygwin, runssh-host-config

Should privilege separation used (no)

Do you want to install sshd as a service (yes)

Cygwin will also prompt whether you want to create a newwindows user to start the service, default user created is “cyg_server”, it is better to use thecurrent domain user

5.Config ssh login

In Cygwin, run ssh-keygen

6.Start sshd service in windowscontrol panel “service”

Or call net start sshd, if the service failedto start, check \var\log\ssh.log

7.Verify ssh login

In Cygwin, run sshlocalhost

Sometimes the default port 22 is not good for usage

We can change port by modify file sshd_config:Port xxx, and change command to ssh localhost-p xxx

For detailed logsusing ssh –v localhost

8.Download and extract hadoop in afile folder

9.Change JAVA_HOME in conf/hadoop-env.sh

10.Test setup

cp conf/*.xml input

bin/hadoopjarhadoop-examples-*.jargrep input output ‘dfs[a-z.]+’

Problems encountered during installation

1.The first time, installsshd service failed

I need to run

sc deletesshdto delete the service and runssh-host-config again

2.Error:Privilege separationuser sshd does not exist

Manually add thefollowing line

sshd:x:74:74:Privilege-separatedSSH:/var/empty/sshd:/sbin/nologin to file:“etc/passwd”

etc/pwd format:

username:password:user id:group id:description:login main directory:shell name

When user logsin, a shell process is started to pass user input to kernel

3.Error:Connection closed by1

If user A needto ssh connect to user B on host B,weneed to copy A’s public key to a file called “authorized_keys” under host B’s“home/<user B>” folder

Createauthorized_keys file:viauthorized_keys

Copy public keyto authorized_keys file:catid_rsa.pub >> authorized_keys

For ssh, accessright of .ssh folder and authorized_keys file need to be set correctly

Chmod 700 /.ssh

Chmod 600/.ssh/authorized_keys (we cannot grant write access to authorized_keys file)

4.Error: Starting hadoop: Java.io.IOException:failed to set permissions of path:\tmp\hadoop-jizhan\mapred\staging\jizhan…..\.staging

This problem occursbecause of a compatibility problem in class org.apache.hadoop.fs.FileUtil

We need tomanually change the method checkReturnValue,just log warn message instead of throw exception

Reference

http://bbym010.iteye.com/blog/1019653

Running Hadoop

1.Under stand-alone mode:

Leave defaultconfiguration

Put file toprocess directly under hadoop/input folder(no need for hadoop file system upload). Output file will be written tohadoop/output folder

2.Under pseudo-distributedmode:

Core-site.xml

<name>fs.default.name</name>

<value>hdfs://localhost:9890</value>

</property>

</configuration>

Mapred-site.xml

<name>mapred.job.tracker</name>

<value>hdfs://localhost:9891</value>

</property>

</configuration>

Hdfs-site.xml

<name>dfs.replication</name>

</property>

</configuration>

Make sure thatlocalhost is in master file

Make sure thatlocalhost is in slaves file

Problem encountered running in standalone mode

1.Reducer does not execute.

There are a fewthings to check when encountering this problem

It is good to explicitly specify mapper and reducer’s output keyclass and value

actual mapper and reducer’s parameter type must match specification,mapper’s output parameter type must match reducer’s input parameter type

Raw Context object will not be accepted for map or reduce method,you need to use a strong typed context.

Mapper<InputKey, InputValue,OutputKey, OutputValue>.Context

Reducer<InputKey, InputValue,OutputKey, OutputValue>.Context

2.Line Reader does not readline correctly, a shorter line carries additional characters from previouslonger line

This is due to awrong way of using Text, a text has an internal byte array and an end index, sothe Text object may contain additional data due to internal buffer expansionafter reading a longer line, those chars will not be cleared and only charsbefore index should be read for a shorter line.

Do not usenew String(text.getBytes())to convert text to string, usetext.toString()

Problem encountered running in pseudo-distributed mode

Error running map-reduce program

14/01/19 12:21:25 WARN mapred.JobClient: Error readingtask outputhttp://L-SHC-

0436751.corp.ebay.com:50060/tasklog?plaintext=true&attemptid=attempt_2014011912

8_0002_m_000001_2&filter=stderr

Hadoop uses unixfile link to redirect output in {HADOOP_DIR}/logs totmp/hadoop-jizhan/mapred/local(notethat hadoop.tmp.dir-> tmp/hadoop-jizhan/)

This is notrecognized as a directory in windows by jdk and exception is thrown

To avoidredirection, we can set property HADOOP_LOG_DIR directly pointing to /tmp/mapred/localthis is the Cygwin /tmp folder, and we needto use unix ln command to map it tolocal folder c:/tmp/hadoop-jizhan/mapred/local

WINDOWS 下 HADOOP 的安装,布布扣,bubuko.com

WINDOWS 下 HADOOP 的安装

原文：http://shadowisper.blog.51cto.com/3189863/1381843

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)