1 | hive> create table user( id int, name string) |
2 | > ROW FORMAT DELIMITED |
3 | > FIELDS TERMINATED BY '\t' |
4 | > STORED AS TEXTFILE; |
2、导入数据:
1 | hive> load data local inpath '/export1/tmp/wyp/row.txt' |
2 | > overwrite into table user; |
3、创建索引之前测试
01 | hive> select * from user where id =500000; |
02 | Total MapReduce jobs = 1 |
03 | Launching Job 1 out of 1 |
04 | Number of reduce tasks is set to 0 since there's no reduce operator |
05 | Cannot run job locally: Input Size (= 356888890) is larger than |
06 | hive.exec.mode.local.auto.inputbytes.max (= 134217728) |
07 | Starting Job = job_1384246387966_0247, Tracking URL = |
08 |
09 | http://l-datalogm1.data.cn1:9981/proxy/application_1384246387966_0247/ |
10 |
11 | Kill Command=/home/q/hadoop/bin/hadoop job -kill job_1384246387966_0247 |
12 | Hadoop job information for Stage-1: number of mappers:2; number of reducers:0 |
13 | 2013-11-13 15:09:53,336 Stage-1 map = 0%, reduce = 0% |
14 | 2013-11-13 15:09:59,500 Stage-1 map=50%,reduce=0%, Cumulative CPU 2.0 sec |
15 | 2013-11-13 15:10:00,531 Stage-1 map=100%,reduce=0%, Cumulative CPU 5.63 sec |
16 | 2013-11-13 15:10:01,560 Stage-1 map=100%,reduce=0%, Cumulative CPU 5.63 sec |
17 | MapReduce Total cumulative CPU time: 5 seconds 630 msec |
18 | Ended Job = job_1384246387966_0247 |
19 | MapReduce Jobs Launched: |
20 | Job 0: Map: 2 Cumulative CPU: 5.63 sec |
21 | HDFS Read: 361084006 HDFS Write: 357 SUCCESS |
22 | Total MapReduce CPU Time Spent: 5 seconds 630 msec |
23 | OK |
24 | 500000 wyp. |
25 | Time taken: 14.107 seconds, Fetched: 1 row(s) |
一共用了14.107s
4、对user创建索引
01 | hive> create index user_index on table user(id) |
02 | > as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' |
03 | > with deferred rebuild |
04 | > IN TABLE user_index_table; |
05 | hive> alter index user_index on user rebuild; |
06 | hive> select * from user_index_table limit 5; |
07 | 0 hdfs://mycluster/user/hive/warehouse/table02/000000_0 [0] |
08 | 1 hdfs://mycluster/user/hive/warehouse/table02/000000_0 [352] |
09 | 2 hdfs://mycluster/user/hive/warehouse/table02/000000_0 [704] |
10 | 3 hdfs://mycluster/user/hive/warehouse/table02/000000_0 [1056] |
11 | 4 hdfs://mycluster/user/hive/warehouse/table02/000000_0 [1408] |
12 | Time taken: 0.244 seconds, Fetched: 5 row(s) |
这样就对user表创建好了一个索引。
5、对创建索引后的user再进行测试
01 | hive> select * from user where id =500000; |
02 | Total MapReduce jobs = 1 |
03 | Launching Job 1 out of 1 |
04 | Number of reduce tasks is set to 0 since there's no reduce operator |
05 | Cannot run job locally: Input Size (= 356888890) is larger than |
06 | hive.exec.mode.local.auto.inputbytes.max (= 134217728) |
07 | Starting Job = job_1384246387966_0247, Tracking URL = |
08 |
09 | http://l-datalogm1.data.cn1:9981/proxy/application_1384246387966_0247/ |
10 |
11 | Kill Command=/home/q/hadoop/bin/hadoop job -kill job_1384246387966_0247 |
12 | Hadoop job information for Stage-1: number of mappers:2; number of reducers:0 |
13 | 2013-11-13 15:23:12,336 Stage-1 map = 0%, reduce = 0% |
14 | 2013-11-13 15:23:53,240 Stage-1 map=50%,reduce=0%, Cumulative CPU 2.0 sec |
15 | 2013-11-13 15:24:00,253 Stage-1 map=100%,reduce=0%, Cumulative CPU 5.27 sec |
16 | 2013-11-13 15:24:01,650 Stage-1 map=100%,reduce=0%, Cumulative CPU 5.27 sec |
17 | MapReduce Total cumulative CPU time: 5 seconds 630 msec |
18 | Ended Job = job_1384246387966_0247 |
19 | MapReduce Jobs Launched: |
20 | Job 0: Map: 2 Cumulative CPU: 5.63 sec |
21 | HDFS Read: 361084006 HDFS Write: 357 SUCCESS |
22 | Total MapReduce CPU Time Spent: 5 seconds 630 msec |
23 | OK |
24 | 500000 wyp. |
25 | Time taken: 13.042 seconds, Fetched: 1 row(s) |
时间用了13.042s这和没有创建索引的效果差不多。
01 | hive> CREATE INDEX employees_index |
02 | > ON TABLE employees (country) |
03 | > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' |
04 | > WITH DEFERRED REBUILD |
05 | > IDXPROPERTIES ('creator' = 'me','created_at' = 'some_time') |
06 | > IN TABLE employees_index_table |
07 | > COMMENT 'Employees indexed by country and name.'; |
08 | FAILED: Error in metadata: java.lang.RuntimeException: \ |
09 | Check the index columns, they should appear in the table being indexed. |
10 | FAILED: Execution Error, return code 1 from \ |
11 | org.apache.hadoop.hive.ql.exec.DDLTask |
这个bug发生在Hive0.10.0、0.10.1、0.11.0,在Hive0.12.0已经修复了,详情请参见:https://issues.apache.org/jira/browse/HIVE-4251