首页 > 其他 > 详细

4.RDD常用算子之transformations

时间：2019-06-03 10:07:50 阅读：117 评论：0 收藏：0 [点我收藏+]

RDD Opertions

transformations:create a new dataset from an existing one

RDDA --> RDDB

技术分享图片

actions: return a value to the driver program after running a computation on the dataset

技术分享图片

For example, map is a transformation that passes each dataset element through a function and returns a new RDD representing the results. On the other hand, reduce is an action that aggregates all the elements of the RDD using some function and returns the final result to the driver program (although there is also a parallel reduceByKey that returns a distributed dataset).

All transformations in Spark are lazy, in that they do not compute their results right away.

技术分享图片

Instead, they just remember the transformations applied to some base dataset (e.g. a file). The transformations are only computed when an action requires a result to be returned to the driver program

This design enables Spark to run more efficiently. For example, we can realize that a dataset created through map will be used in a reduce and return only the result of the reduce to the driver, rather than the larger mapped dataset.

技术分享图片

def my_map():

data = [1,2,3,4,5]

rdd1 = sc.parallelize(data)

rdd2 = rdd1.map(lambda x: x * 2 )

print(rdd2.collect())

技术分享图片

def my_filter():

data = [1, 2, 3, 4, 5]

# rdd1 = sc.parallelize(data)

# rdd2 = rdd1.map(lambda x: x * 2)

# rdd3 = rdd2.filter(lambda x:x > 5)

# print(rdd3.collect())

print(sc.parallelize(data).map(lambda x:x*2).filter(lambda x:x>5).collect())

技术分享图片

def my_flatMap():

data = ["hello spark","hello ming","hello clay"]

print(sc.parallelize(data).flatMap(lambda line:line.split(" ")).collect())

技术分享图片

技术分享图片

技术分享图片

技术分享图片

def my_reduceByKey():

data = ["hello spark","hello ming","hello clay"]

rdd = sc.parallelize(data)

mapRdd = rdd.flatMap(lambda line: line.split(" ")).map(lambda x:(x,1))

my_reduceByKeyRdd = mapRdd.reduceByKey(lambda a,b:a+b)

print(my_reduceByKeyRdd.collect())

技术分享图片

技术分享图片

技术分享图片

union:

技术分享图片

distinct:

技术分享图片

join:

技术分享图片

技术分享图片

技术分享图片

技术分享图片

4.RDD常用算子之transformations

原文：https://www.cnblogs.com/huangguoming/p/10965873.html

踩

(0)

赞

(0)

举报

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)

最新文章

更多>

教程昨日排行

更多>

友情链接

汇智网 PHP教程插件网

关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com

© 2014 bubuko.com 版权所有

打开技术之扣，分享程序人生！