Scalding初探：基于Scala的Hadoop利器

时间：2014-03-05 04:51:35 阅读：601 评论：0 收藏：0 [点我收藏+]

好的，今天开始直播基于Scala的Scalding啦，循序渐进地看以下页面：

https://github.com/twitter/scalding#scalding

https://github.com/twitter/scalding/wiki/Getting-Started

https://github.com/twitter/scalding/wiki/Fields-based-API-Reference

https://github.com/willf/scalding_cookbook

https://github.com/twitter/scalding/wiki/API-Reference

https://github.com/twitter/scalding/wiki

看到第四个页面scalding-cookbook的时候，可以开始尝试写比Word Count更酷的Scalding程序了

 1 import com.twitter.scalding._
 2 
 3 // input (tsv)
 4 // 0   1     2     3    4   5   6
 5 // 22  kinds of    love nn2 io  nn1
 6 // 12  large green eyes jj  jj  nn2
 7 //
 8 // output (tsv)
 9 // 22 of    kinds/nn2_love/nn1
10 // 12 green large/jj_eyes/nn2
11 
12 class contextCountJob(args : Args) extends Job(args) {
13   val inSchema = (‘count, ‘w1 ,‘w2, ‘w3, ‘pos1, ‘pos2, ‘pos3)
14   val outSchema = (‘count, ‘word, ‘context)
15   Tsv(args("input"),inSchema)
16     .mapTo(inSchema -> outSchema) { 
17       parts : (String, String, String, String, String, String, String) => {
18         val (count, w1, w2, w3, pos1, pos2, pos3) = parts
19         val context = "%s/%s_%s/%s".format(w1,pos1,w3,pos3)
20         (count, w2, context)
21       }
22     }
23   .write(Tsv(args("output")))
24 }

比较糟糕的是Scala语言新潮到博客园插件都不支持。。。

Scalding初探：基于Scala的Hadoop利器,布布扣,bubuko.com

Scalding初探：基于Scala的Hadoop利器

原文：http://www.cnblogs.com/wei-li/p/ScaldingFirstSight.html

踩

(0)

评论一句话评论（0）