首页 > 其他 > 详细

pyspark中通过textFile读取的rdd不能count()

时间:2021-01-28 17:44:08      阅读:31      评论:0      收藏:0      [点我收藏+]

记录一下pyspark的一些坑

在用textFile读取文件后,我想看看有多少数据量,就想用count算一下

rdd = sc.textFile("/home/parastor/backup/datum/bus/gps/2017-07-17/*/*.gz").filter(lambda x:x!=None)
print(rdd.count())

然后报错了

Traceback (most recent call last):
  File "/root/hxj/tmp/pycharm/fmm4bus.py", line 98, in <module>
    print(rdd.count())
  File "/root/anaconda3/envs/py37/lib/python3.7/site-packages/pyspark/rdd.py", line 1055, in count
    return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
  File "/root/anaconda3/envs/py37/lib/python3.7/site-packages/pyspark/rdd.py", line 1046, in sum
    return self.mapPartitions(lambda x: [sum(x)]).fold(0, operator.add)
  File "/root/anaconda3/envs/py37/lib/python3.7/site-packages/pyspark/rdd.py", line 917, in fold
    vals = self.mapPartitions(func).collect()
  File "/root/anaconda3/envs/py37/lib/python3.7/site-packages/pyspark/rdd.py", line 816, in collect
    sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
  File "/root/anaconda3/envs/py37/lib/python3.7/site-packages/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/root/anaconda3/envs/py37/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.lang.IllegalArgumentException: Unsupported class file major version 55
    at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:166)
    at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:148)
    at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:136)
    at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:237)

解决方法,在程序前面显式指出java_home即可

我用的远程解释器,先上服务器查一下java的一些信息

输入

java -verbose

查看最后两行

技术分享图片

 

 

 然后在代码中指定。

import os
os.environ[JAVA_HOME]=/usr/lib/java/jdk1.8.0_212

 

pyspark中通过textFile读取的rdd不能count()

原文:https://www.cnblogs.com/xujih/p/14341063.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!