HIVE语句操作整理

时间：2020-03-17 23:09:37 阅读：69 评论：0 收藏：0 [点我收藏+]

创建表：

create [external] table [if not exists] table_name
[(col_name data_type [comment col_comment] , ... )]
[comment table_comment]
[partitioned by (col_name data_type [COMMENT col_comment] , ...)]
[clustered by (col_name, col_name, ...) [sorted by (col_name [ASC|DESC], ...)] into num_buckets buckets]
[row format row_format]
[stored as AS file_format]
[location hdfs_path]
语法说明：

1）external 关键字
加上这个关键字建的表是外部表
不加这个关键字建的表就是内部表

     内部表和外部表的区别：
     1）概念本质上
     内部表数据自己的管理的在进行表删除时数据和元数据一并删除。
     外部表只是对HDFS的一个目录的数据进行关联，外部表在进行删除时只删除元数据， 原始数据是不会被删除的。
     2）应用场景上
     外部表一般用于存储原始数据、公共数据，内部表一般用于存储某一个模块的中间结果数据。     
     3）存储目录上
     外部表：一般在进行建表时候需要手动指定表的数据目录为共享资源目录，用lication关键字指定。
     内部表：无严格的要求，一般使用的默认目录。

     部门1：建表
     create external table log1
     (course string,name string,score int) 
     row format delimited fields terminated by ',' 
     location '/source/log';
     部门2：建表
     create external table log2
     (course string,name string,score int) 
     row format delimited fields terminated by ',' 
     location '/source/log';
     外部表的数据如何彻底删除：
     1）drop table tablename;
     2）在去HDFS的对应的数据存储目录  hadoop fs -rm -r path

2）IF NOT EXISTS：
如果该表存在，会报错误，加入关键字【if not exists】可以忽略该报错。
3）COMMENT：指定列或表的描述信息
4）[partitioned by (col_name data_type [COMMENT col_comment] , ...)]：

     partitioned by  指定分区字段
     partitioned by（分区字段名 |分区字段类型 | COMMENT 字段描述信息）
     注意：分区字段一定不能存在于建表字段中。

5）[clustered by (col_name, col_name, ...) [sorted by (col_name [ASC|DESC], ...)] into num_buckets buckets]：

     这部分语句是用来指定分桶的。
     clustered by (col_name, col_name, ...)  指定分桶字段
     注意：分桶字段一定是建表字段中的一个或几个
     sorted by  指定的是每一个桶表中的排序规则
     into num_buckets buckets   指定桶的个数

6）[row format row_format] 指定分割符的：

     fields terminated by   列分割符
     lines terminated by   行分割符
     map keys terminated by

7）[stored as AS file_format] 指定原始数据的存储格式：

     textfile 文本格式   默认的方式
     cfile   行列格式
     在行的方向切分数据的存储的块   保证一行数据在一个数据块中
     每列个块中存储的时候 进行划分存储的
     SequenceFile  二进制存储格式

8）location 指定原始数据的存储位置的：

     一定是hdfs上的路径
     这里没有指定  读取配置文件中的  hive-site.xml
     如果指定则会覆盖配置文件中的位置

注：hive的原始数据存储的配置说明

     1)hive-default.xml   
     2)hive-site.xml   
     3)建表语句 LOCATION
     加载顺序：1）---2）---3）
     生效：最后加载的最终生效

创建表的案例：
1）创建一个内部表：

    create table if not exists student (grade int,stu_id int,name string,yuwen string,shuxue string,yingyu string) COMMENT 'studnet score' 
    row format delimited fields terminated by '\t' 
    lines terminated by '\n' 
    stored as textfile 
    location '/user/data/student';

2）创建一个外部表：

    create external table if not exists student_external  (grade int,stu_id int,name string,yuwen string,shuxue string,yingyu string) COMMENT 'studnet score' 
    row format delimited fields terminated by '\t' 
    lines terminated by '\n' 
    stored as textfile 
    location '/user/data/student_external';

3）创建一个分区表：

    选择一个分区字段：根据过滤条件
    分区字段 grade
    create external table if not exists student_ptn 
    (stu_id int,name string,yuwen string,shuxue string,yingyu string) 
    COMMENT 'student score in partitions grade' 
    partitioned by (grade int) 
    row format delimited fields terminated by '\t'; 

    注意：分区表的字段一定不能是建表字段 。

4）创建一个分桶表：

    分桶字段：name   排序：yuwen  shuxue  yingyu   desc
    桶个数3
    分桶表的字段一定在建表语句中
    create external table if not exists student_buk (grade int,stu_id int,name string,yuwen string,shuxue string,yingyu string) 
    clustered by (name) sorted by (yuwen desc,shuxue desc,yingyu desc)  into 3 buckets 
    row format delimited fields terminated by '\t' ;

5）进行表复制：

    关键字  like   
  create table if not exists stu_like like student;
  只会复制表结构，表的属性（表的存储位置 表的类型）不会被复制的。

6）c t a s语句建表：

    create table tablename as select .... from ...
    将sql语句的查询结果存放在一个表中。

1.2.2、查看表的描述信息：

desc tablename;    只能查看表的字段信息
desc extended tablename;   查看表的详细描述信息  所有的信息放在一行的
desc formatted tablename;  格式化显示表的详细信息  ****

1.2.3、查看表的列表：

show tables;   查看当前数据库的表列表信息
show tables in dbname;  查看指定数据库的表列表信息
show tables like 'student*';模糊查询表
show partitions tablename; 查询指定表下的所有分区

1.2.4、表的修改

1）表的重命名：

    alter table tablename rename to newname; 
    alter table stu_like01 rename to student_copy;
    修改元数据信息也可以

2）修改列：

        1）增加列
        alter table tablename add columns (name type);
        alter table student_copy add columns (content string);
        2）修改列
        alter table tablename change oldname newname type;
        修改列名
        alter table student_copy change content text string;
        修改列类型
        alter table student_copy change text text int;
        alter table student_copy change grade grade string;
        hive2.0版本中对类型转换限制了  
        小类型----》大类型    允许的
        大类型----》小类型    报错  
        3）替换列  了解
        alter table tablename replace columns(name type);
        alter table stu_test replace columns(id int);

3）修改分区信息：

        1）添加分区  根据分区字段进行添加
        手动添加分区
        alter table tablename add partition(name=value);
        alter table student_ptn add partition(grade=1303);
        1304   1305   1306  1307
        alter table student_ptn add partition(grade=1304);
        一次添加多个分区
        alter table student_ptn add partition(grade=1305) partition(grade=1306) partition(grade=1307);
        2）修改分区的存储位置
        分区的默认存储位置：表的目录下创建的分区目录
        /user/hive/hivedata/bd1808.db/student_ptn/grade=1303
        我们可以手动指定某一个分区的存储位置：如下
        添加分区的时候指定
            alter table student_ptn add partition(grade=1308) location '/user/student/1308';
        对于已经已添加的分区修改存储位置  了解
            添加数据的时候才生效  不会立即生效的
            alter table tablename partition(name=value) set location '';
            alter table student_ptn partition(grade=1303) set location '/user/student/1303';

1.2.5、表/分区数据的清空：

truncate table tablename; 清空表
truncate table tablename partition(name=value);  清空某一个分区的数据

1.2.6、删除表：

drop table if exists tablename;

HIVE语句操作整理

原文：https://www.cnblogs.com/pbd2020/p/12513787.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)