首页 > 其他 > 详细

编写简单的Mapreduce程序并部署在Hadoop2.2.0上运行

时间:2014-03-11 16:31:33      阅读:514      评论:0      收藏:0      [点我收藏+]

今天主要来说说怎么在Hadoop2.2.0分布式上面运行写好的 Mapreduce 程序。

可以在eclipse写好程序,export或用fatjar打包成jar文件。

先给出这个程序所依赖的Maven包:

bubuko.com,布布扣
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>Temperature</groupId>
  <artifactId>Temperature</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <build>
    <sourceDirectory>src</sourceDirectory>
    <plugins>
      <plugin>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.1</version>
        <configuration>
          <source>1.7</source>
          <target>1.7</target>
        </configuration>
      </plugin>
    </plugins>
  </build>
  <dependencies>

  <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-mapreduce-client-core</artifactId>
      <version>2.2.0</version>
  </dependency>
  <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>2.2.0</version>
  </dependency>
  <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-mapreduce-client-common</artifactId>
      <version>2.2.0</version>
  </dependency>
  <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
      <version>2.2.0</version>
  </dependency>
  </dependencies>
</project>
bubuko.com,布布扣

好了,现在给出程序,代码如下:

Mapper

bubuko.com,布布扣
package org.ccnt.mr;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

public class Map extends MapReduceBase implements
        Mapper<LongWritable, Text, Text, IntWritable> {

    private static final int MISSING = 9999;

    @Override
    public void map(LongWritable key, Text value,
            OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        String line = value.toString();
        String year = line.substring(15, 19);
        int airTemperature;
        if (line.charAt(87) == ‘+‘)
            airTemperature = Integer.parseInt(line.substring(88, 92));
        else
            airTemperature = Integer.parseInt(line.substring(87, 92));
        String quality = line.substring(92, 93);
        if (airTemperature != MISSING && quality.matches("[01459]")) {
            output.collect(new Text(year), new IntWritable(airTemperature));
        }
    }

}
bubuko.com,布布扣

Reducer:

bubuko.com,布布扣
package org.ccnt.mr;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;

public class Reduce extends MapReduceBase implements
        Reducer<Text, IntWritable, Text, IntWritable> {

    @Override
    public void reduce(Text key, Iterator<IntWritable> values,
            OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        int maxValue = Integer.MIN_VALUE;
        while (values.hasNext()) {
            maxValue = Math.max(maxValue, values.next().get());
        }
        output.collect(key, new IntWritable(maxValue));
    }

}
bubuko.com,布布扣

Main

bubuko.com,布布扣
package org.ccnt.mr;

import java.io.IOException;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;

public class MaxTemperature {
    
    public static void main(String[] args) throws IOException {
        System.out.println(args.length);
        for (String string : args) {
            System.out.println(string);
        }
        if (args.length != 2) {
            System.err.println("Error");
            System.exit(1);
        }
        
        JobConf conf = new JobConf(MaxTemperature.class);
        conf.setJobName("Max Temperature");
        FileInputFormat.addInputPath(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));
        conf.setMapperClass(Map.class);
        conf.setReducerClass(Reduce.class);
        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);
        JobClient.runJob(conf);
    }

}
bubuko.com,布布扣

将上面的程序编译和打包成jar文件,然后开始在Hadoop2.2.0(本文假定用户都部署好了Hadoop2.2.0)上面部署了。下面主要讲讲如何去部署: 
首先,启动Hadoop2.2.0,命令如下:

sbin/start-dfs.sh
sbin/start-yarn.sh

打包编译jar文件有两种方式:

1)直接用export导出jar包,生成默认的MANIFEST.MF文件,不需要写main方法所在的类

使用的命令:

bin/hadoop jar ~/Downlowd/MaxTemperature.jar org.ccnt.mr.MaxTemperature input/data.txt result

2)用Fat jar工具导出jar包,不需要导出依赖的(hadoop环境有),其实也就是MANIFEST.MF文件有了main方法所在的类。

bin/hadoop jar ~/Download/Temperature input/data.txt result2

结果是一样的。

附程序测试的数据的下载地址:http://pan.baidu.com/s/1iSacM

 

Reference:

[原]编写简单的Mapreduce程序并部署在Hadoop2.2.0上运行

编写简单的Mapreduce程序并部署在Hadoop2.2.0上运行,布布扣,bubuko.com

编写简单的Mapreduce程序并部署在Hadoop2.2.0上运行

原文:http://www.cnblogs.com/549294286/p/3593573.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!