分享

sqoop导入Oracle数据到HIVE总结

hyj 2013-12-25 20:05:05 发表于 总结型 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 2 44642
本帖最后由 hyj 于 2013-12-25 20:15 编辑

一、首先需要环境准备
准备好演示环境后启动 hadoop 环境使用JPS查看到的进程如下,说明启动正常:
  1.    [hadoop@domain bin]$ jps
  2. 2658 NameNode
  3. 3064 TaskTracker
  4. 5082 Jps
  5. 2760 DataNode
  6. 2866 SecondaryNameNode
  7. 2939 JobTracker</P>
复制代码
先使用sqoop查看oracle scott/tiger)数据库表正确使用sqoop语句导入scott用户指定的表到HIVE 在这特别要注意的是 SCOTT在语句中必须为大写,否则在导数的时候出错: ERROR tool.ImportTool: ImportedFailed: Attempted to generate class with no columns! 执行后的语句信息如下:
  1. [hadoop@domain bin]$ sqoop import -connect jdbc:oracle:thin:@192.168.1.2:1521:YONYOU -username SCOTT -password tiger --table EMP --hive-import -m 1
  2. 13/03/30 00:40:23 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  3. 13/03/30 00:40:23 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
  4. 13/03/30 00:40:23 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
  5. 13/03/30 00:40:23 INFO manager.SqlManager: Using default fetchSize of 1000
  6. 13/03/30 00:40:24 INFO tool.CodeGenTool: Beginning code generation
  7. 13/03/30 00:40:25 INFO manager.OracleManager: Time zone has been set to GMT
  8. 13/03/30 00:40:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM EMP t WHERE 1=0
  9. 13/03/30 00:40:25 INFO orm.CompilationManager: HADOOP_HOME is /home/hadoop/hadoop-0.20.2
  10. 13/03/30 00:40:28 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/57b0c91f97a25b025cfcf0729f316f39/EMP.jar
  11. 13/03/30 00:40:28 INFO manager.OracleManager: Time zone has been set to GMT
  12. 13/03/30 00:40:28 INFO manager.OracleManager: Time zone has been set to GMT
  13. 13/03/30 00:40:28 INFO mapreduce.ImportJobBase: Beginning import of EMP
  14. 13/03/30 00:40:28 INFO manager.OracleManager: Time zone has been set to GMT
  15. 13/03/30 00:40:32 INFO mapred.JobClient: Running job: job_201303292217_0004
  16. 13/03/30 00:40:33 INFO mapred.JobClient: map 0% reduce 0%
  17. 13/03/30 00:40:59 INFO mapred.JobClient: map 100% reduce 0%
  18. 13/03/30 00:41:02 INFO mapred.JobClient: Job complete: job_201303292217_0004
  19. 13/03/30 00:41:02 INFO mapred.JobClient: Counters: 16
  20. 13/03/30 00:41:02 INFO mapred.JobClient: Job Counters
  21. 13/03/30 00:41:02 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=27161
  22. 13/03/30 00:41:02 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  23. 13/03/30 00:41:02 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  24. 13/03/30 00:41:02 INFO mapred.JobClient: Launched map tasks=1
  25. 13/03/30 00:41:02 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
  26. 13/03/30 00:41:02 INFO mapred.JobClient: FileSystemCounters
  27. 13/03/30 00:41:02 INFO mapred.JobClient: HDFS_BYTES_READ=87
  28. 13/03/30 00:41:02 INFO mapred.JobClient: FILE_BYTES_WRITTEN=65258
  29. 13/03/30 00:41:02 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=701
  30. 13/03/30 00:41:02 INFO mapred.JobClient: Map-Reduce Framework
  31. 13/03/30 00:41:02 INFO mapred.JobClient: Map input records=12
  32. 13/03/30 00:41:02 INFO mapred.JobClient: Physical memory (bytes) snapshot=80519168
  33. 13/03/30 00:41:02 INFO mapred.JobClient: Spilled Records=0
  34. 13/03/30 00:41:02 INFO mapred.JobClient: CPU time spent (ms)=2070
  35. 13/03/30 00:41:02 INFO mapred.JobClient: Total committed heap usage (bytes)=15794176
  36. 13/03/30 00:41:02 INFO mapred.JobClient: Virtual memory (bytes) snapshot=712142848
  37. 13/03/30 00:41:02 INFO mapred.JobClient: Map output records=12
  38. 13/03/30 00:41:02 INFO mapred.JobClient: SPLIT_RAW_BYTES=87
  39. 13/03/30 00:41:02 INFO mapreduce.ImportJobBase: Transferred 701 bytes in 33.7146 seconds (20.7922 bytes/sec)
  40. 13/03/30 00:41:02 INFO mapreduce.ImportJobBase: Retrieved 12 records.
  41. 13/03/30 00:41:02 INFO manager.OracleManager: Time zone has been set to GMT
  42. 13/03/30 00:41:02 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM EMP t WHERE 1=0
  43. 13/03/30 00:41:02 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive
  44. 13/03/30 00:41:02 WARN hive.TableDefWriter: Column MGR had to be cast to a less precise type in Hive
  45. 13/03/30 00:41:02 WARN hive.TableDefWriter: Column HIREDATE had to be cast to a less precise type in Hive
  46. 13/03/30 00:41:02 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive
  47. 13/03/30 00:41:02 WARN hive.TableDefWriter: Column COMM had to be cast to a less precise type in Hive
  48. 13/03/30 00:41:02 WARN hive.TableDefWriter: Column DEPTNO had to be cast to a less precise type in Hive
  49. 13/03/30 00:41:02 INFO hive.HiveImport: Removing temporary files from import process: hdfs://localhost:9000/user/hadoop/EMP/_logs
  50. 13/03/30 00:41:02 INFO hive.HiveImport: Loading uploaded data into Hive
  51. 13/03/30 00:41:08 INFO hive.HiveImport: Hive history file=/tmp/hadoop/hive_job_log_hadoop_201303300041_854717120.txt
  52. 13/03/30 00:41:19 INFO hive.HiveImport: OK
  53. 13/03/30 00:41:19 INFO hive.HiveImport: Time taken: 10.866 seconds
  54. 13/03/30 00:41:19 INFO hive.HiveImport: Loading data to table default.emp
  55. 13/03/30 00:41:20 INFO hive.HiveImport: OK
  56. 13/03/30 00:41:20 INFO hive.HiveImport: Time taken: 0.507 seconds
  57. 13/03/30 00:41:20 INFO hive.HiveImport: Hive import complete.
  58. 13/03/30 00:41:20 INFO hive.HiveImport: Export directory is empty, removing it.
复制代码
使用HIVE语句查看对应的数据如下:
  1. [hadoop@domain bin]$ hive
  2. Hive history file=/tmp/hadoop/hive_job_log_hadoop_201303300045_1940247672.txt
  3. hive> show tables;
  4. OK
  5. emp
  6. hive_test
  7. invites
  8. pokes
  9. Time taken: 6.632 seconds
  10. hive> select * from emp;
  11. OK
  12. 7369.0 SMITH CLERK 7902.0 1980-12-17 00:00:00.0 800.0 NULL 20.0
  13. 7499.0 ALLEN SALESMAN 7698.0 1981-02-20 00:00:00.0 1600.0 300.0 30.0
  14. 7521.0 WARD SALESMAN 7698.0 1981-02-22 00:00:00.0 1250.0 500.0 30.0
  15. 7566.0 JONES MANAGER 7839.0 1981-04-02 00:00:00.0 2975.0 NULL 20.0
  16. 7654.0 MARTIN SALESMAN 7698.0 1981-09-28 00:00:00.0 1250.0 1400.0 30.0
  17. 7698.0 BLAKE MANAGER 7839.0 1981-05-01 00:00:00.0 2850.0 NULL 30.0
  18. 7782.0 CLARK MANAGER 7839.0 1981-06-09 00:00:00.0 2450.0 NULL 10.0
  19. 7839.0 KING PRESIDENT NULL 1981-11-17 00:00:00.0 5000.0 NULL 10.0
  20. 7844.0 TURNER SALESMAN 7698.0 1981-09-08 00:00:00.0 1500.0 0.0 30.0
  21. 7900.0 JAMES CLERK 7698.0 1981-12-03 00:00:00.0 950.0 NULL 30.0
  22. 7902.0 FORD ANALYST 7566.0 1981-12-03 00:00:00.0 3000.0 NULL 20.0
  23. 7934.0 MILLER CLERK 7782.0 1982-01-23 00:00:00.0 1300.0 NULL 10.0
  24. Time taken: 0.777 seconds
  25. hive>
复制代码
与Oracle中的原始数据一致 (空格对应HIVE为NULL)
1        7369        SMITH        CLERK        7902        1980-12-17 星期三        800.00               
202        7499        ALLEN        SALESMAN        7698        1981-02-20 星期五        1600.00        300.00      
303        7521        WARD        SALESMAN        7698        1981-02-22 星期日        1250.00        500.00        
304        7566        JONES        MANAGER        7839        1981-04-02 星期四        2975.00               
205        7654        MARTIN        SALESMAN        7698        1981-09-28 星期一        1250.00        1400.00        
306        7698        BLAKE        MANAGER        7839        1981-05-01 星期五        2850.00               
307        7782        CLARK        MANAGER        7839        1981-06-09 星期二        2450.00               
108        7839        KING        PRESIDENT                1981-11-17 星期二        5000.00               
109        7844        TURNER        SALESMAN        7698        1981-09-08 星期二        1500.00        0.00        
3010        7900        JAMES        CLERK        7698        1981-12-03 星期四        950.00               
3011        7902        FORD        ANALYST        7566        1981-12-03 星期四        3000.00               
2012        7934        MILLER        CLERK        7782        1982-01-23 星期六        1300.00                10


至此 sqoop导入Oracle数据到HIVE成功!!!
二、有些看到上面还是不明白,这里进一步明确一下
Shell代码
  1. sqoop export --table tableName -connect jdbc:oracle:thin:@xx.xx.xx.xx:port:xxx --username userNameA --password pwd --export-dir sourceFile --input-fields-terminated-by '\001' --input-lines-terminated-by '\n'
复制代码
常见错误总结:
sqoop导出数据到oracle:报错:
异常代码:
  1. java.lang.IllegalArgumentException: Attempted to generate class with no columns!
  2. at org.apache.sqoop.orm.ClassWriter.generateClassForColumns(ClassWriter.java:1295)
  3. at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1176)
  4. at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
  5. at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
  6. at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
  7. at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
  8. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
  9. at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
  10. at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
  11. at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
  12. at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
  13. Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
  14. *****************
  15. 2013-08-29 12:18:17,217 INFO org.apache.sqoop.manager.SqlManager: Executing SQL statement: SELECT t.* FROM TMP t WHERE 1=0
  16. 2013-08-29 12:18:17,235 ERROR org.apache.sqoop.Sqoop: Got exception running Sqoop: java.lang.IllegalArgumentException: Attempted to generate class with no columns!
复制代码
查看Sqoop源码发现这么一段:
此异常是在没有找到目标表的column时抛出:

Java代码
  1. private StringBuilder generateClassForColumns(
  2. Map<String, Integer> columnTypes,
  3. String [] colNames, String [] dbWriteColNames) {
  4. if (colNames.length ==0) {
  5. throw new IllegalArgumentException("Attempted to generate class with "
  6. + "no columns!");
  7. }
复制代码
再查导致获取columns为空的原因,发现columns的获取是通过如下代码:

Java代码
  1. String [] colNames = options.getColumns();
  2. if (null == colNames) {
  3. if (null != tableName) {
  4. // Table-based import. Read column names from table.
  5. colNames = connManager.getColumnNames(tableName);
  6. } else if (options.getCall() != null) {
  7. // Read procedure arguments from metadata
  8. colNames = connManager.getColumnNamesForProcedure(
  9. this.options.getCall());
  10. } else {
  11. // Infer/assign column names for arbitrary query.
  12. colNames = connManager.getColumnNamesForQuery(
  13. this.options.getSqlQuery());
  14. }
  15. }
复制代码
可以发现,它是先找参数columns的值,如果没有再去通过table去查。

通过table查寻是通过该table的owner去查:
见org.apache.sqoop.manager.OracleManager.java
中getColumnNames()方法:

Java代码
  1. conn = getConnection();
  2. pStmt = conn.prepareStatement(QUERY_COLUMNS_FOR_TABLE,
  3. ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
  4. pStmt.setString(1, tableOwner);
  5. pStmt.setString(2, shortTableName);
  6. rset = pStmt.executeQuery();
复制代码
其主要查询语句为:
Java代码
  1. public static final String QUERY_COLUMNS_FOR_TABLE =
  2. "SELECT COLUMN_NAME FROM ALL_TAB_COLUMNS WHERE "
  3. + "OWNER = ? AND TABLE_NAME = ? ORDER BY COLUMN_ID";
复制代码
在sqoop语句中并没有指定columns的值,它会理所当然的去查table的columns。
而我的table是同义词,其属主用户并非userNameA。

这下解决方式很明显了,两种:
1.将user替换成table的属主用户。
2.添加--columns属性,显示的指定columns的值。






已有(2)人评论

跳转到指定楼层
allenswf 发表于 2015-3-24 06:24:52
谢谢楼主 点赞点赞
回复

使用道具 举报

ainubis 发表于 2015-3-29 04:39:01
O(∩_∩)O谢谢~\(≧▽≦)/~啦啦啦
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条