sqoop操作之ORACLE导入到HIVE

时间：2014-08-06 21:47:42 阅读：777 评论：0 收藏：0 [点我收藏+]

导入表的所有字段

sqoop import --connect jdbc:oracle:thin:@192.168.1.107:1521:ORCL \
--username SCOTT --password tiger \
--table EMP \
--hive-import  --create-hive-table --hive-table emp  -m 1;

如果报类似的错:

ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory EMP already exists

先去HDFS系统中删除该文件： hadoop fs -rmr /user/hadoop/EMP

如果报类似的错：

FAILED: Error in metadata: AlreadyExistsException(message:Table emp already exists)

如果报类似的错：

hive.HiveImport: Exception in thread "main" java.lang.NoSuchMethodError: org.apache.thrift.EncodingUtils.setBit(BIZ)B

这是因为在同路径下安装了hive和hbase，而hbase和hive的lib目录下的thrift版本不同。
hbase下的为libthrift-0.x.0.jar，hive下的为libthrift-0.x.0.jar。将Hbase下的0.x.0版的删除，换为0.x.0的即可。
ps：不知为什么Sqoop向Hive中导入数据还有Hbase的事；

说明：hive表已经存在，需要先删除。

查看:

desc emp;
empno   double
ename   string
job     string
mgr     double
hiredate        string
sal     double
comm    double
deptno  double


select * from emp;
7369.0  SMITH   CLERK   7902.0  1980-12-17 00:00:00.0   800.0   NULL    20.0
7499.0  ALLEN   SALESMAN  7698.0  1981-02-20 00:00:00.0   1600.0  300.0   30.0
7521.0  WARD    SALESMAN 7698.0  1981-02-22 00:00:00.0   1250.0  500.0   30.0
7566.0  JONES   MANAGER 7839.0  1981-04-02 00:00:00.0   2975.0  NULL    20.0
7654.0  MARTIN  SALESMAN  7698.0  1981-09-28 00:00:00.0   1250.0  1400.0  30.0
……

注：一般情况下不使用--create-hive-table去创建表的，因为它创建的表的字段格式，不符合我们的要求。

导入表的指定字段

手工创建hive表：

create table emp_column(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
row format delimited fields terminated by ‘\t‘ lines terminated by ‘\n‘ 
stored as textfile;

sqoop import --connect jdbc:oracle:thin:@192.168.1.107:1521:ORCL \
--username SCOTT --password tiger \
--table EMP --columns "EMPNO,ENAME,JOB,SAL,COMM" \
--fields-terminated-by ‘\t‘ --lines-terminated-by ‘\n‘ \
--hive-drop-import-delims --hive-import  --hive-table emp_column \
-m 3;

说明：重新再执行,每重复导入一次，hive中的数据会重复导入。

sqoop import --connect jdbc:oracle:thin:@192.168.1.107:1521:ORCL \
--username SCOTT --password tiger \
--table EMP --columns "EMPNO,ENAME,JOB,SAL,COMM" \
--fields-terminated-by ‘\t‘ --lines-terminated-by ‘\n‘ \
--hive-drop-import-delims --hive-overwrite --hive-import --hive-table emp_column \
-m 3;

注：--hive-overwrite指定覆盖表里已经存在的记录，99%都是要使用overwrite的，避免重跑时产生重复数据。

导入表的指定字段到hive分区表

创建hive分区表：

create table emp_partition(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
partitioned by (pt string)
row format delimited fields terminated by ‘\t‘ lines terminated by ‘\n‘ 
stored as textfile;

导入pt=‘2013-08-01‘

sqoop import --connect jdbc:oracle:thin:@192.168.1.107:1521:ORCL \
--username SCOTT --password tiger \
--table EMP --columns "EMPNO,ENAME,JOB,SAL,COMM" \
--hive-overwrite --hive-import  --hive-table emp_partition \
--fields-terminated-by ‘\t‘ --lines-terminated-by ‘\n‘ \
--hive-drop-import-delims --hive-partition-key ‘pt‘ --hive-partition-value ‘2013-08-01‘ \
-m 3;

导入pt=‘2013-08-02‘

sqoop import --connect jdbc:oracle:thin:@192.168.1.107:1521:ORCL \
--username SCOTT --password tiger \
--table EMP --columns "EMPNO,ENAME,JOB,SAL,COMM" \
--hive-overwrite --hive-import  --hive-table emp_partition \
--fields-terminated-by ‘\t‘ --lines-terminated-by ‘\n‘ \
--hive-drop-import-delims  --hive-partition-key ‘pt‘ --hive-partition-value ‘2013-08-02‘ \
-m 3;

查询：

select * from emp_partition where pt=‘2013-08-01‘;
select * from emp_partition where pt=‘2013-08-02‘;

sqoop操作之ORACLE导入到HIVE,布布扣,bubuko.com

sqoop操作之ORACLE导入到HIVE

原文：http://www.cnblogs.com/luogankun/p/3895290.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)