好的,今天开始直播基于Scala的Scalding啦,循序渐进地看以下页面:
https://github.com/twitter/scalding#scalding
https://github.com/twitter/scalding/wiki/Getting-Started
https://github.com/twitter/scalding/wiki/Fields-based-API-Reference
https://github.com/willf/scalding_cookbook
https://github.com/twitter/scalding/wiki/API-Reference
https://github.com/twitter/scalding/wiki
看到第四个页面scalding-cookbook的时候,可以开始尝试写比Word Count更酷的Scalding程序了
1 import com.twitter.scalding._ 2 3 // input (tsv) 4 // 0 1 2 3 4 5 6 5 // 22 kinds of love nn2 io nn1 6 // 12 large green eyes jj jj nn2 7 // 8 // output (tsv) 9 // 22 of kinds/nn2_love/nn1 10 // 12 green large/jj_eyes/nn2 11 12 class contextCountJob(args : Args) extends Job(args) { 13 val inSchema = (‘count, ‘w1 ,‘w2, ‘w3, ‘pos1, ‘pos2, ‘pos3) 14 val outSchema = (‘count, ‘word, ‘context) 15 Tsv(args("input"),inSchema) 16 .mapTo(inSchema -> outSchema) { 17 parts : (String, String, String, String, String, String, String) => { 18 val (count, w1, w2, w3, pos1, pos2, pos3) = parts 19 val context = "%s/%s_%s/%s".format(w1,pos1,w3,pos3) 20 (count, w2, context) 21 } 22 } 23 .write(Tsv(args("output"))) 24 }
比较糟糕的是Scala语言新潮到博客园插件都不支持。。。
Scalding初探:基于Scala的Hadoop利器,布布扣,bubuko.com
原文:http://www.cnblogs.com/wei-li/p/ScaldingFirstSight.html