分享

hive如何对复合数据类型创建索引

本帖最后由 pig2 于 2015-3-18 16:05 编辑

问题导读:
1.复合数据类型如何创建索引?
2.hive能否对其中属性做索引?






本文在下面基础上做一个扩展:
hive复合数据类型 array、map、struct使用


我们知道hive可以创建索引,那么对于复合数据结构,里面包含比较多的属性,如下面:
学生表中info,包含nameage

  1. hive> create table student_test(id INT, info struct<name:STRING, age:INT>)  
  2.     > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','                        
  3.     > COLLECTION ITEMS TERMINATED BY ':';                                   
  4. OK  
  5. Time taken: 0.446 seconds  
复制代码
假如我们执行如下查询操作:

select * from student_test where info.age=80 ;

我们就想提高效率,对age创建索引,行不行那?
答案是:不行的
那我们该如何做:
hive提供了对复合数据类型创建索引,那就是对整个数据类型创建索引,而不是针对单个属性。

下面是我们没有创建索引前,执行的操作。

1.创建索引前
我们看到用的时间为:91.85s

外部表.jpg


hive> select * from student_test where info.age=80 ;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1405418310771_0001, Tracking URL = http://master:8088/proxy/application_1405418310771_0001/
Kill Command = /usr/hadoop/bin/hadoop job  -kill job_1405418310771_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-07-15 23:03:29,706 Stage-1 map = 0%,  reduce = 0%
2014-07-15 23:04:07,900 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.97 sec
2014-07-15 23:04:08,977 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.97 sec
MapReduce Total cumulative CPU time: 1 seconds 970 msec
Ended Job = job_1405418310771_0001
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.97 sec   HDFS Read: 275 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 970 msec
OK
4        {"name":"li","age":80}
Time taken: 91.105 seconds, Fetched: 1 row(s)

2.创建索引

创建structure结构.jpg




  1. hive> create index student_test_index on table student_test(info)
  2.     > as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
  3.     > with deferred rebuild
  4.     > IN TABLE student_test_index_table;
复制代码
  1. hive> alter index student_test_index on student_test_index rebuild;
复制代码

3.对比查询

下面用时26.167,比刚开始的 91.105快了很多
创建索引后.jpg


hive> select * from student_test where info.age=80;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1405418310771_0003, Tracking URL = http://master:8088/proxy/application_1405418310771_0003/
Kill Command = /usr/hadoop/bin/hadoop job  -kill job_1405418310771_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-07-15 23:19:05,465 Stage-1 map = 0%,  reduce = 0%
2014-07-15 23:19:17,093 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.61 sec
2014-07-15 23:19:18,148 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.61 sec
MapReduce Total cumulative CPU time: 1 seconds 610 msec
Ended Job = job_1405418310771_0003
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.61 sec   HDFS Read: 275 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 610 msec
OK
4        {"name":"li","age":80}
Time taken: 26.167 seconds, Fetched: 1 row(s)

本文链接http://www.aboutyun.com/thread-8456-1-1.html



已有(3)人评论

跳转到指定楼层
june_fu 发表于 2015-3-18 11:46:31
符合数据类型  修改下把   应该是 复合数据类型,让帖子更完美
回复

使用道具 举报

feng01301218 发表于 2015-3-18 16:00:57
符合数据类型  修改下把   
回复

使用道具 举报

ainubis 发表于 2015-3-29 13:17:38
好东西,多xie楼主分享
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条