Spart RDD

时间：2017-04-09 12:55:50 阅读：217 评论：0 收藏：0 [点我收藏+]

RDD: Resilient Distributed Dataset

1. Spark RDD is immutable

Since the RDD is immutable, splitting a big one to smaller ones, distributing them to
various worker nodes for processing, and finally compiling the results to produce the final
result can be done safely without worrying about the underlying data getting changed.

2.Spark RDD is distributable

3.Spark RDD lives in memory

Spark does keep all the RDDs in the memory as much as it can. Only in rare situations,
where Spark is running out of memory or if the data size is growing beyond the capacity, is
it written to disk. Most of the processing on RDD happens in the memory, and that is the
reason why Spark is able to process the data at a lightning fast speed.

4.Spark RDD is strongly typed

Spark RDD can be created using any supported data types. These data types can be
Scala/Java supported intrinsic data types or custom created data types such as your own
classes. The biggest advantage coming out of this design decision is the freedom from
runtime errors. If it is going to break because of a data type issue, it will break during
compile time.

Spart RDD

原文：http://www.cnblogs.com/ordili/p/6684089.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)