生产中Hive静态和动态分区表，该怎样抉择呢？

时间：2019-06-28 01:36:53 阅读：129 评论：0 收藏：0 [点我收藏+]

一.需求

按照不同部门作为分区，导数据到目标表

二.使用静态分区表来完成

1.创建静态分区表：

create table emp_static_partition(
empno int, 
ename string, 
job string, 
mgr int, 
hiredate string, 
sal double, 
comm double)
PARTITIONED BY(deptno int)
row format delimited fields terminated by ‘\t‘;

2.插入数据：

hive>insert into table emp_static_partition partition(deptno=10)
     select empno , ename , job , mgr , hiredate , sal , comm from emp where deptno=10;

3.查询数据：

hive>select * from emp_static_partition;

技术分享图片

三.使用动态分区表来完成

1.创建动态分区表：

create table emp_dynamic_partition(
empno int, 
ename string, 
job string, 
mgr int, 
hiredate string, 
sal double, 
comm double)
PARTITIONED BY(deptno int)row format delimited fields terminated by ‘\t‘;

【注意】动态分区表与静态分区表的创建，在语法上是没有任何区别的

2.插入数据：

hive>insert into table emp_dynamic_partition partition(deptno)     
select empno , ename , job , mgr , hiredate , sal , comm, deptno from emp;

【注意】分区的字段名称，写在最后，有几个就写几个与静态分区相比，不需要where

需要设置属性的值：

hive>set hive.exec.dynamic.partition.mode=nonstrict；

假如不设置，报错如下:
技术分享图片
3.查询数据：

hive>select * from emp_dynamic_partition;
技术分享图片
分区列为deptno，实现了动态分区

四.总结

在生产上我们更倾向是选择动态分区，
无需手工指定数据导入的具体分区，
而是由select的字段(字段写在最后，有几个写几个)自行决定导出到哪一个分区中，并自动创建相应的分区，使用上更加方便快捷，在生产工作中用的非常多多。