首页 > 其他 > 详细

Spart RDD

时间:2017-04-09 12:55:50      阅读:217      评论:0      收藏:0      [点我收藏+]

RDD: Resilient Distributed Dataset

1. Spark RDD is immutable

Since the RDD is immutable, splitting a big one to smaller ones, distributing them to
various worker nodes for processing, and finally compiling the results to produce the final
result can be done safely without worrying about the underlying data getting changed.

 

2.Spark RDD is distributable

 

3.Spark RDD lives in memory

Spark does keep all the RDDs in the memory as much as it can. Only in rare situations,
where Spark is running out of memory or if the data size is growing beyond the capacity, is
it written to disk. Most of the processing on RDD happens in the memory, and that is the
reason why Spark is able to process the data at a lightning fast speed.

 

4.Spark RDD is strongly typed

Spark RDD can be created using any supported data types. These data types can be
Scala/Java supported intrinsic data types or custom created data types such as your own
classes. The biggest advantage coming out of this design decision is the freedom from
runtime errors. If it is going to break because of a data type issue, it will break during
compile time.

 

Spart RDD

原文:http://www.cnblogs.com/ordili/p/6684089.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!