最近工作涉及了从hive表---》drds的数据迁移,近期做了一些简单的测试: 建表语句如下: hive单表: CREATE TABLE item_zc_test1( num_id string, dp_id string, approve_status string, title string, price string, nick string, cid string, pic_url string, props string, list_time string, modified string, delist_time string, date_id string, day_id string ) row format delimited fields terminated by '\001' stored as textfile; hive分区表(二级分区): CREATE TABLE item_zc_test_partition( num_id string, dp_id string, approve_status string, title string, price string, nick string, cid string, pic_url string, props string, list_time string, modified string, delist_time string ) partitioned by (date_id string,day_id string) row format delimited fields terminated by '\001' stored as textfile; drds建表语句: 1)单表: create table item_zc_test1 (
num_id varchar(100),
dp_id varchar(100),
approve_status varchar(100),
title varchar(100),
price varchar(100),
nick varchar(100),
cid varchar(100),
pic_url varchar(200),
props varchar(400),
list_time varchar(100),
modified varchar(100),
delist_time varchar(100),
date_id varchar(50),
day_id varchar(50),
primary key(num_id)
); 2)分库不分表 create table item_zc_test (
num_id varchar(100),
dp_id varchar(100),
approve_status varchar(100),
title varchar(100),
price varchar(100),
nick varchar(100),
cid varchar(100),
pic_url varchar(200),
props varchar(400),
list_time varchar(100),
modified varchar(100),
delist_time varchar(100),
date_id varchar(50),
day_id varchar(50),
primary key(num_id)
) dbpartition by hash(num_id); 3)分库分表 create table item_zc_test2 (
num_id varchar(100),
dp_id varchar(100),
approve_status varchar(100),
title varchar(100),
price varchar(100),
nick varchar(100),
cid varchar(100),
pic_url varchar(200),
props varchar(400),
list_time varchar(100),
modified varchar(100),
delist_time varchar(100),
date_id varchar(50),
day_id varchar(50),
primary key(num_id)
) dbpartition by hash(num_id) tbpartition by hash(dp_id) tbpartitions 3; 1、hive单表---》drds单表 sqoop脚本: sqoop export \ --connect jdbc:mysql://xxxx/xx \ --username xxxx \ --password xx \ --table item_zc_test1 \ --export-dir /user/hive/warehouse/item_zc_test1\ --input-fields-terminated-by '\001' \ --num-mappers 1 验证结果:通过sqoop可以正常的将hdfs文件中的数据导入到drds中。 2、hive单表---》drds分库不分表 sqoop脚本: sqoop export \ --connect jdbc:mysql://xxxx/xx \ --username xxxx \ --password xx \ --table item_zc_test \ --export-dir /user/hive/warehouse/item_zc_test1 \ --input-fields-terminated-by '\001' \ --num-mappers 1 验证结果:通过sqoop可以正常的将hdfs文件中的数据导入到drds中。 3、hive单表---》drds分库分表 注意:如果drds是分库分表情况下,如果在按照上述脚本进行数据迁移,脚本执行会报错,对脚本修改如下(指定相应的字段): sqoop export \ --connect jdbc:mysql://xxxx/xx \ --username xxxx \ --password xx \ --table item_zc_test2 \ --columns "num_id,dp_id,approve_status,title,price,nick,cid,pic_url,props,list_time,modified,delist_time,date_id,day_id" \ --export-dir /user/hive/warehouse/item_zc_test1 \ --input-fields-terminated-by '\001' \ --num-mappers 5 4、下面介绍hive分区表导出数据到drds 默认sqoop官方参数说明:导出hive分区表数据只支持hive单表和hive一级分区表(脚本如下): sqoop export \ --connect jdbc:mysql://xxxx/xx \ --username xxxx \ --password xx \ --table table \ --hive-partition-key pk \ // 指定一级分区字段 --hive-partition-value ‘pv’ \ // 指定分区值 --export-dir /user/hive/warehouse/table/pk=pv/ \ --input-fields-terminated-by '\001' \ --input-lines-terminated-by '\n' hive多级分区表sqoop导出方案,建议以创建中间临时表的方式进行数据导出。 参考链接:https://stackoverflow.com/questions/33100799/sqoop-export-the-partitioned-hive-table |
|小黑屋|about云开发-学问论坛|社区 ( 京ICP备12023829号 )
GMT+8, 2019-4-3 19:10 , Processed in 0.393296 second(s), 24 queries , Gzip On.