首页 > 其他 > 详细

HIVE存储格式ORC、PARQUET对比

时间:2019-11-06 23:44:07      阅读:160      评论:0      收藏:0      [点我收藏+]
  hive有三种默认的存储格式,TEXT、ORC、PARQUET。TEXT是默认的格式,ORC、PARQUET是列存储格式,占用空间和查询效率是不同的,专门测试过后记录一下。

一:建表语句差别

create table if not exists text(
a bigint
) partitioned by (dt string)
row format delimited fields terminated by ‘\001‘
location ‘/hdfs/text/‘;

create table if not exists orc(
a bigint)
partitioned by (dt string)
row format delimited fields terminated by ‘\001‘
stored as orc
location ‘/hdfs/orc/‘;

create table if not exists parquet(
a bigint)
partitioned by (dt string)
row format delimited fields terminated by ‘\001‘
stored as parquet
location ‘/hdfs/parquet/‘;

 

其实就是stored as 后面跟的不一样

二:HDFS存储对比

parquet orc text
709M 275M 1G
687M 249M 1G
647M 265M 1G

 

三:查询时间对比

parquet orc text
36.451 26.133 42.574
38.425 29.353 41.673
36.647 27.825 43.938

四:文件如何生成

val sparkSession = SparkSession.builder().master("local").appName("pushFunnelV3").getOrCreate()
val javasc = new JavaSparkContext(sparkSession.sparkContext)
val nameRDD = javasc.parallelize(util.Arrays.asList("{‘name‘:‘zhangsan‘,‘age‘:‘18‘}", "{‘name‘:‘lisi‘,‘age‘:‘19‘}")).rdd;
sparkSession.read.json(nameRDD).write.mode(SaveMode.Overwrite).csv("/data/aa")
sparkSession.read.json(nameRDD).write.mode(SaveMode.Overwrite).orc("/data/bb")
sparkSession.read.json(nameRDD).write.mode(SaveMode.Overwrite).parquet("/data/cc")

HIVE存储格式ORC、PARQUET对比

原文:https://www.cnblogs.com/wuxiaolong4/p/11809291.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!