立即注册 登录
About云-梭伦科技 返回首页

pig2的个人空间 https://www.aboutyun.com/?61 [收藏] [复制] [分享] [RSS]

日志

hive如何对复合数据类型创建索引

已有 1151 次阅读2014-7-15 23:47

问题导读:
1.符合数据类型如何创建索引?
2.hive能否对其中属性做索引?






本人在下面基础上做一个扩张:
hive复合数据类型 array、map、struct使用


我们知道hive可以创建索引,那么对于符合数据结构,里面包含比较多的属性,如下面:
学生表中info,包含nameage

  1. hive> create table student_test(id INT, info struct<name:STRING, age:INT>)  
  2.     > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','                         
  3.     > COLLECTION ITEMS TERMINATED BY ':';                                   
  4. OK  
  5. Time taken: 0.446 seconds  

假如我们执行如下查询操作:

select * from student_test where info.age=80 ;

我们就想提高效率,对age创建索引,行不行那?
答案是:不行的
那我们该如何做:
hive提供了对符合数据类型创建索引,那就是对整个数据类型创建索引,而不是针对单个属性。

下面是我们没有创建索引前,执行的操作。

1.创建索引前
我们看到用的时间为:91.85s

 


hive> select * from student_test where info.age=80 ;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1405418310771_0001, Tracking URL = http://master:8088/proxy/application_1405418310771_0001/
Kill Command = /usr/hadoop/bin/hadoop job  -kill job_1405418310771_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-07-15 23:03:29,706 Stage-1 map = 0%,  reduce = 0%
2014-07-15 23:04:07,900 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.97 sec
2014-07-15 23:04:08,977 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.97 sec
MapReduce Total cumulative CPU time: 1 seconds 970 msec
Ended Job = job_1405418310771_0001
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 1.97 sec   HDFS Read: 275 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 970 msec
OK
4        {"name":"li","age":80}
Time taken: 91.105 seconds, Fetched: 1 row(s)

2.创建索引

 




  1. hive> create index student_test_index on table student_test(info) 
  2.     > as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' 
  3.     > with deferred rebuild
  4.     > IN TABLE student_test_index_table;

  1. hive> alter index student_test_index on student_test_index rebuild;
复制代码

3.对比查询

下面用时26.167,比刚开始的 91.105快了很多
 


hive> select * from student_test where info.age=80;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1405418310771_0003, Tracking URL = http://master:8088/proxy/application_1405418310771_0003/
Kill Command = /usr/hadoop/bin/hadoop job  -kill job_1405418310771_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-07-15 23:19:05,465 Stage-1 map = 0%,  reduce = 0%
2014-07-15 23:19:17,093 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.61 sec
2014-07-15 23:19:18,148 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.61 sec
MapReduce Total cumulative CPU time: 1 seconds 610 msec
Ended Job = job_1405418310771_0003
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 1.61 sec   HDFS Read: 275 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 610 msec
OK
4        {"name":"li","age":80}
Time taken: 26.167 seconds, Fetched: 1 row(s)


路过

雷人

握手

鲜花

鸡蛋

评论 (0 个评论)

facelist doodle 涂鸦板

您需要登录后才可以评论 登录 | 立即注册

关闭

推荐上一条 /2 下一条