Spark-WordCount

时间：2019-08-17 12:59:46 阅读：89 评论：0 收藏：0 [点我收藏+]

words.txt 数据

this is one line
this is two line


def main(args: Array[String]): Unit = {
    //创建SparkConf()并且设置App的名称
    val conf = new SparkConf()
    .setAppName("wordCount")
    .setMaster("local")  // 如果需要在集群运行需要注释掉setMaster,不然在集群里面就是单个节点运行.

    //创建SparkContext,该对象是提交spark app的入口
    val sc = new SparkContext(conf)

    //使用sc创建rdd,并且执行相应的transformation和action
    // sc.textFile("hdfs://master:9000/words.txt") //master主机上的 hdfs的 /words.txt文件
    sc.textFile("D:\\words.txt") // 本地的 D:\words.txt
    .flatMap(_.split(" ")) // 按照空格拆分每一行数据
    .map((_, 1)) // 将拆分的数据转换成 (word,1)的形式
    .reduceByKey(_ + _, 1) // 将相同的单词的value相加,并且设置为1个分区
    .sortBy(_._2, false) // 根据value进行 降序排序
    .foreach(println) // 打印输出

    //    停止sc，结束该任务
    sc.stop()
}

(this,2)
(is,2)
(line,2)
(two,1)
(one,1)

Spark-WordCount

原文：https://www.cnblogs.com/studyNotesSL/p/11367751.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)