创建一个分区表,分区的单位时dt和国家名 hive> create table logs(ts bigint,line string) > partitioned by (dt String,country string);
hive> load data local inpath ‘/root/hive/partitions/file1‘ into table logs > partition (dt=‘2001-01-01‘,country=‘GB‘);
我们继续执行下面语句,先看一下什么效果 hive> load data local inpath ‘/root/hive/partitions/file2‘ into table logs > partition (dt=‘2001-01-01‘,country=‘GB‘); Loading data to table default.logs partition (dt=2001-01-01, country=GB) OK Time taken: 1.379 seconds hive> load data local inpath ‘/root/hive/partitions/file3‘ into table logs > partition (dt=‘2001-01-01‘,country=‘US‘); Loading data to table default.logs partition (dt=2001-01-01, country=US) OK Time taken: 1.307 seconds hive> load data local inpath ‘/root/hive/partitions/file4‘ into table logs > partition (dt=‘2001-01-02‘,country=‘GB‘); Loading data to table default.logs partition (dt=2001-01-02, country=GB) OK Time taken: 1.253 seconds hive> load data local inpath ‘/root/hive/partitions/file5‘ into table logs > partition (dt=‘2001-01-02‘,country=‘US‘); Loading data to table default.logs partition (dt=2001-01-02, country=US) OK Time taken: 1.07 seconds hive> load data local inpath ‘/root/hive/partitions/file6‘ into table logs > partition (dt=‘2001-01-02‘,country=‘US‘); Loading data to table default.logs partition (dt=2001-01-02, country=US) OK Time taken: 1.227 seconds
├── dt=2001-01-01/
│ ├── country=GB/
│ │ ├── file1
│ │ └── file2
│ └── country=US/
│ └── file3
└── dt=2001-01-02/
├── country=GB/
│ └── file4
└── country=US/
├── file5
└── file6
关键点1:partitioned by (dt String,country string); 创建表格时,指明了这是一个分区表。将建立双层目录,第一次目录的名字和第二层目录名字规则
PARTITIONED BY子句中定义列,是表中正式的列,成为分区列。但是数据文件中并没有这些值,仅代表目录。
关键点2: partition (dt=‘2001-01-01‘,country=‘GB‘); 上传数据时,把数据分别上传到不同分区中。也就是分别放在不同的子目录下。
查看分区结构 hive> show partitions logs; OK dt=2001-01-01/country=GB dt=2001-01-01/country=US dt=2001-01-02/country=GB dt=2001-01-02/country=US
hive> select ts,dt,line > from logs > where country=‘GB‘; OK 1 2001-01-01 Log line 1 2 2001-01-01 Log line 2 4 2001-01-02 Log line 4
hive> select ts,dt,line
> from logs
> where dt=‘2001-01-02‘
> and country=‘US‘;
5 2001-01-02 Log line 5
6 2001-01-02 Log line 6