hadoop权威指南上有一个求历史最高温度的经典案例,源数据如下:
-- sample.txt
0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999
0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999
0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999
0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999
0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999
--通过spark来求天气的最大值比写MapReduce不知道简单了多少倍 var lines=sc.textFile("/root/wangbin/sample.txt") -- 定义函数,正数不取符号,负数取符号 var data=lines.map(line=>{ if(line.charAt(87)==‘+‘) {(line.substring(15,19),line.substring(88,92))} else {(line.substring(15,19),line.substring(87,92))} }) -- 把第二列数据转为浮点型 var data2=data.map(res=>(res._1,res._2.toDouble)) -- 取相对key分组的最大value值 var data3=data2.reduceByKey((x,y)=>Math.max(x,y))