Spark Tutorial – Learn Spark Programming

时间：2018-03-07 14:10:55 阅读：440 评论：0 收藏：1 [点我收藏+]

Introduction to Spark Programming

That is Spark? Spark Programming is nothing but a general-purpose & lightning fast cluster computing platform. That reveals development API’s, which also qualifies data workers to author streaming,单词machine learning or SQL workloads which demand repeated access to data sets. However, Spark can perform batch processing and stream processing. Batch processing refers to the processing of the previously collected job in a single batch. Whereas stream processing means to deal with Spark Streaming Data.

Also, it is designed in such a way that it integrates with all the Big data tools. Like spark can access any Hadoop data source, also can run on Hadoop clusters., Apache Spark extends Hadoop MapReduce to next level. That also includes iterative Query And stream processing.

One more common belief about Spark is that it is an extension of Hadoop. Although that is not true. However, Spark is independent of Hadoop since it has its own cluster management system. Basically, it uses Hadoop for storage purpose only.

Although, there is one spark’s key feature that it has in-memory cluster computation capability. Also increases the processing speed of an application.

Basically, Apache Spark offers high-level APIs to users, such as Java, Scala, Python and R., Spark is written in Scala still offers rich APIs in Scala, Java, Python, as well as R. We can say, it is a tool for running lighting applications.

Most importantly, on comparing Spark with Hadoop, It is 100 times faster faster than Big Data Hadoop and 10 times faster than accessing data from disk.

Spark History

At first, in 2009 Apache Spark was introduced in the UC Berkeley R & D Lab. Which is now known as AMPLab. Afterwards , in 2010 it became open source under BSD license. Further, the spark was donated to Apache Software Foundation, in 2013 Then in 2014, it became top-level Apache project.

Why Spark?

Spark Tutorial – Why Spark?

As we know, there was no general purpose computing engine in the industry, since

To perform batch processing, we were using Hadoop MapReduce.
Also, to perform stream processing, we were using Apache Storm / S4.
Moreover, for interactive processing, we were using Apache Impala / Apache Tez.
To perform graph processing, we were using Neo4j / Apache Giraph.

There was was no powerful engine in the industry, that can process the data both in real-time and batch mode. Also, there was a requirement that one engine can respond in sub-second and perform in-memory processing.

Basic, these features create the difference between Hadoop and Spark. Also makes a huge comparison between Spark vs Storm.

Apache Spark Components

In this Apache Spark Tutorial, we discussed Spark Components. It puts the promise for faster data processing as well as easier development. It is only possible because of its components. All these Spark components resolved the issues that occurred while while using Hadoop MapReduce.

to learn more Spark Ecosystem Component

Spark Tutorial – Learn Spark Programming

踩

(5)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)