目的:
使用Sqoop将Oracle中的数据导入到HBase中,并自动生成组合行键!
环境:
Hadoop2.2.0
Hbase0.96
sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz
Oracle11g
jdk1.7
Ubuntu14 Server
这里关于环境吐槽一句:
最新版本的Sqoop1.99.3功能太弱,只支持导入数据到HDFS,没有别的任何选项,太土了!(如有不同意见欢迎讨论给出解决方案)
命令:
sqoop import --connect jdbc:oracle:thin:@192.168.0.147:1521:ORCLGBK --username ZHAOBIAO --P --table CMS_NEWS_0625 --hbase-create-table --hbase-table 147patents --column-family patentinfo --split-by CREATE_TIME --hbase-row-key
"CREATE_TIME,PUBLISH_TIME,TITLE"
注意几点:
1.Oracle的表名必须大写(--table CMS_NEWS_0625) ;
2.用户名必须大写字母( --username ZHAOBIAO);
3.组合行键参数中的字段名都必须大写( --hbase-row-key "create_time,publish_time,operate_time,title");
4.作为组合行键的几个字段都不能有null值,否则会报错,请执行该语句前先确认。
异常解决
过程中遇到报错:
Error: java.io.IOException: Could not insert row with null value for row-key column: OPERATE_TIME
at org.apache.sqoop.hbase.ToStringPutTransformer.getPutCommand(ToStringPutTransformer.java:125)
at org.apache.sqoop.hbase.HBasePutProcessor.accept(HBasePutProcessor.java:142)
at org.apache.sqoop.mapreduce.DelegatingOutputFormat$DelegatingRecordWriter.write(DelegatingOutputFormat.java:128)
at org.apache.sqoop.mapreduce.DelegatingOutputFormat$DelegatingRecordWriter.write(DelegatingOutputFormat.java:92)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:634)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.sqoop.mapreduce.HBaseImportMapper.map(HBaseImportMapper.java:38)
at org.apache.sqoop.mapreduce.HBaseImportMapper.map(HBaseImportMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
原因1:
--hbase-row-key "create_time,publish_time,operate_time,title"
字段名小写了,需要将其修改为大写
--hbase-row-key "CREATE_TIME,PUBLISH_TIME,TITLE"
原因2:
该字段在原表中的确存在Null值。
【甘道夫】Hadoop2.2.0环境使用Sqoop-1.4.4将Oracle11g数据导入HBase0.96,并自动生成组合行键,布布扣,bubuko.com
【甘道夫】Hadoop2.2.0环境使用Sqoop-1.4.4将Oracle11g数据导入HBase0.96,并自动生成组合行键
原文:http://blog.csdn.net/u010967382/article/details/36397353