首页 > 其他 > 详细

tachyon与hdfs,以及spark整合

时间:2015-09-23 02:07:39      阅读:384      评论:0      收藏:0      [点我收藏+]

Tachyon 0.7.1伪分布式集群安装与测试:?
http://blog.csdn.net/stark_summer/article/details/48321605?
从官方文档得知,Spark 1.4.x和Tachyon 0.6.4版本兼容,而最新版的Tachyon 0.7.1和Spark 1.5.x兼容,目前所用的Spark为1.4.1,tachyon为 0.7.1

tachyon 与 hdfs整合

修改tachyon-env.sh

export?TACHYON_UNDERFS_ADDRESS=hdfs://master:8020Dtachyon.data.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/data12

上传文件到hdfs

?hadoop?fs?-put?/home/cluster/data/test/bank/?/data/spark/

?hadoop?fs?-ls?/data/spark/bank/Found?3?items-rw-r--r--???3?wangyue?supergroup????4610348?2015-09-11?20:02?/data/spark/bank/bank-full.csv-rw-r--r--???3?wangyue?supergroup???????3864?2015-09-11?20:02?/data/spark/bank/bank-names.txt-rw-r--r--???3?wangyue?supergroup?????461474?2015-09-11?20:02?/data/spark/bank/bank.csv1234567

通过tachyon 读取/data/spark/bank/bank-full.csv文件

val?bankFullFile?=?sc.textFile("tachyon://master:19998/data/spark/bank/bank-full.csv/bank-full.csv")2015-09-11?20:08:20,136?INFO??[main]?storage.MemoryStore?(Logging.scala:logInfo(59))?-?ensureFreeSpace(177384)?called?with?curMem=630803,?maxMem=2579182382015-09-11?20:08:20,137?INFO??[main]?storage.MemoryStore?(Logging.scala:logInfo(59))?-?Block?broadcast_3?stored?as?values?in?memory?(estimated?size?173.2?KB,?free?245.2?MB)2015-09-11?20:08:20,154?INFO??[main]?storage.MemoryStore?(Logging.scala:logInfo(59))?-?ensureFreeSpace(17665)?called?with?curMem=808187,?maxMem=2579182382015-09-11?20:08:20,155?INFO??[main]?storage.MemoryStore?(Logging.scala:logInfo(59))?-?Block?broadcast_3_piece0?stored?as?bytes?in?memory?(estimated?size?17.3?KB,?free?245.2?MB)2015-09-11?20:08:20,156?INFO??[sparkDriver-akka.actor.default-dispatcher-2]?storage.BlockManagerInfo?(Logging.scala:logInfo(59))?-?Added?broadcast_3_piece0?in?memory?on?localhost:41040?(size:?17.3?KB,?free:?245.9?MB)2015-09-11?20:08:20,157?INFO??[main]?spark.SparkContext?(Logging.scala:logInfo(59))?-?Created?broadcast?3?from?textFile?at?<console>:21bankFullFile:?org.apache.spark.rdd.RDD[String]?=?MapPartitionsRDD[7]?at?textFile?at?<console>:21123456789

count

bankFullFile.count()
但是发现报错如下:
2015-09-11?21:34:31,494?WARN??[Executor?task?launch?worker-6]??(RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320))?-?Read?nothing2015-09-11?21:34:31,495?WARN??[Executor?task?launch?worker-6]??(RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320))?-?Read?nothing2015-09-11?21:34:31,489?WARN??[Executor?task?launch?worker-7]??(RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320))?-?Read?nothing2015-09-11?21:34:31,495?WARN??[Executor?task?launch?worker-7]??(RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320))?-?Read?nothing2015-09-11?21:34:31,495?WARN??[Executor?task?launch?worker-7]??(RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320))?-?Read?nothing2015-09-11?21:34:31,495?WARN??[Executor?task?launch?worker-7]??(RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320))?-?Read?nothing2015-09-11?21:34:31,495?WARN??[Executor?task?launch?worker-7]??(RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320))?-?Read?nothing2015-09-11?21:34:31,495?WARN??[Executor?task?launch?worker-7]??(RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320))?-?Read?nothing2015-09-11?21:34:31,496?WARN??[Executor?task?launch?worker-7]??(RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320))?-?Read?nothing2015-09-11?21:34:31,496?WARN??[Executor?task?launch?worker-7]??(RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320))?-?Read?nothing2015-09-11?21:34:31,496?WARN??[Executor?task?launch?worker-7]??(RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320))?-?Read?nothing2015-09-11?21:34:31,496?WARN??[Executor?task?launch?worker-7]??(RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320))?-?Read?nothing2015-09-11?21:34:31,496?WARN??[Executor?task?launch?worker-7]??(RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320))?-?Read?nothing2015-09-11?21:34:31,496?WARN??[Executor?task?launch?worker-7]??(RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320))?-?Read?nothing2015-09-11?21:34:31,496?WARN??[Executor?task?launch?worker-7]??(RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320))?-?Read?nothing123456789101112131415161718

感觉错误很诡异,有人知道这是什么原因?tell me why?

但是 我在tachyon 文件系统中可以看到如下内容:

./bin/tachyon?tfs?ls?/data/spark/bank/bank-full.csv/4502.29?KB09-11-2015?20:09:02:078??Not?In?Memory??/data/spark/bank/bank-full.csv/bank-full.csv123

而bank-full.csv在hdfs文件是

hadoop?fs?-ls?/data/spark/bank/Found?3?items-rw-r--r--???3?wangyue?supergroup????4610348?2015-09-11?20:02?/data/spark/bank/bank-full.csv-rw-r--r--???3?wangyue?supergroup???????3864?2015-09-11?20:02?/data/spark/bank/bank-names.txt-rw-r--r--???3?wangyue?supergroup?????461474?2015-09-11?20:02?/data/spark/bank/bank.csv123456

其实Tachyon本身将bank-full.csv文件加载到了内存,并存放到自身的文件系统里面:tachyon://master:19998/data/spark/bank/bank-full.csv/bank-full.csv”?
Tachyon的conf/tachyon-env.sh文件里面配置的,通过export TACHYON_UNDERFS_ADDRESS=hdfs://master:8020配置,这样tachyon://localhost:19998就可以获取hdfs文件指定路径文件

好吧,那我就先通过hdfs方式读取文件然后 保存到tachyon

scala>?val?bankfullfile?=??sc.textFile("/data/spark/bank/bank-full.csv")
scala>?bankfullfile.countres0:?Long?=?45212scala>?bankfullfile.saveAsTextFile("tachyon://master:19998/data/spark/bank/newbankfullfile")12345

未完成,待续~

tachyon与hdfs,以及spark整合

原文:http://stark-summer.iteye.com/blog/2245285

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!