首页 > 其他 > 详细

pyspark

时间:2021-01-12 00:31:54      阅读:54      评论:0      收藏:0      [点我收藏+]

# Example

from pyspark.sql import SparkSession
spark=SparkSession.builder.appName("boye").getOrCreate()
#spark = SparkSession.builder.appName("test").master("local[2]").getOrCreate() #运行在本地(local),2个线程
sc = spark.sparkContext
sc = spark.sparkContext
datas = ["hi I love you", "hello", "ni hao"]
sc = spark.sparkContext
rdd = sc.parallelize(datas).filter(lambda x:x.__contains__("he"))
print(rdd.collect())
print(rdd.count())

 

#配置环境变量

export SPARK_HOME=spark-2.4.3-bin-hadoop2.7
export PATH$SPARK_HOME/bin:$PATH
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.4-src. zip:$PYTHON PATH
export PYSPARK_PYTHON=/opt/local/python/bin/python3
export PYSPARK_DRIVER_PYTHON=/opt/local/python/bin/python3

 运行:spark-submit  --master local[*] spark_001.py

 

pyspark

原文:https://www.cnblogs.com/boye169/p/14264942.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!