2. 运行Spark Streaming

时间：2019-09-14 23:43:17 阅读：91 评论：0 收藏：0 [点我收藏+]

2.1 IDEA编写程序

　　　　　　Pom.xml加入以下依赖：

<dependency>
    <groupId>org.apache.spark</groupId> 
    <artifactId>spark-streaming_2.11</artifactId>
    <version>${spark.version}</version> 
    <scope>provided</scope>
</dependency>

　　　　　　案例如下：

import org.apache.spark.SparkConf

import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * Created by huicheng on 25/07/2019.
  * */

object WorldCount {
  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
    val ssc = new StreamingContext(conf, Seconds(1))

    // Create a DStream that will connect to hostname:port, like localhost:9999
    val lines = ssc.socketTextStream("master01", 9999)

    // Split each line into words
    val words = lines.flatMap(_.split(" "))

    //import org.apache.spark.streaming.StreamingContext._ // not necessary since Spark 1.3
    // Count each word in each batch
    val pairs = words.map(word => (word, 1))
    val wordCounts = pairs.reduceByKey(_ + _)

    // Print the first ten elements of each RDD generated in this DStream to the console
    wordCounts.print()

    ssc.start() // Start the computation
    ssc.awaitTermination() // Wait for the computation to terminate }
  }

}

　　　　　　按照Spark Core中的方式进行打包，并将程序上传到Spark机器。并运行：

bin/spark-submit --class com.c.streaming.WorldCount ~/wordcount-jar-with- dependencies.jar

　　　　　　通过Netcat发送数据：

# TERMINAL 1:
# Running Netcat

$ nc -lk 9999

hello world

　　　　　　如果程序运行时，log日志太多，可以将spark conf目录下的log4j文件里面的日志级别改成WARN

2. 运行Spark Streaming

原文：https://www.cnblogs.com/zhanghuicheng/p/11227372.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)