创建二级索引的方法

1.1 MapReduce使用整合MapReduce的方式创建hbase索引。主要的流程如下：

（1）扫描输入表，使用hbase实现类TableMapper
（2）获取rowkey和指定字段名称和字段值
（3）创建Put实例， value=rowkey, rowkey=columnName + "_" +columnValue
（4）使用IdentityTableReducer将数据写入索引表

支持创建索引的方式：

创建单列索引
同时创建多个单列索引
创建联合索引（最多同时支持3个列）
Json Column 单列索引，多个单列索引，联合索引（最多3个json字段）
只根据rowkey创建索引

创建索引命令

1. 单列索引

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:mid
hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:mid -s 20130101 -e 20130120 -v 1

2. 同时创建多个单列索引

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:mid,cf1:age,cf2:msg
hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:mid,cf1:age,cf2:msg -s 20130101 -e 20130120 -v 3

3. 联合索引

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:mid,cf1:age,cf2:msg -si false
hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:mid,cf1:age,cf2:msg -si false -s 20130101 -e 20130120 -v 1

4. json列索引. 单列、联合索引

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:msg -j area,type,category
hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:msg -j area,type,category -s 20130101 -e 20130120 -v 1
hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:msg -j area,type,category -si false
hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:msg -j area,type,category -si false -s 20130101 -e 20130120 -v 1

5. 只根据rowkey创建索引

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c rowkey -r uid:1,mid:2,isrowkey:1
hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c rowkey:cf1:content -r uid:1,mid:2,isrowkey:1
hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c rowkey:cf1:content -r uid:1,mid:2,isrowkey:1 -s 20130101 -e 20130120
hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c rowkey:cf1:content -r uid:1,mid:2,isrowkey:1 -s 20130101 -e 20130120 -v 1
1.2 ITHBASE重构代码，基于hbase-9.94.0和hadoop-1.0.4，因侵入性太强，不建议此种方法。
$HBASE_HOME/conf/hbase-site.xml：

hbase.hlog.splitter.impl

org.apache.hadoop.hbase.regionserver.transactional.THLogSplitter

hbase.regionserver.class

org.apache.hadoop.hbase.ipc.IndexedRegionInterface

hbase.regionserver.impl

org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer

hbase.hregion.impl

org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegion
1.3 IHBASE原作者代码缺失很多类，无法编译。因侵入性太强，不建议此种方法。
1.4 Coprocessor实现demo。此种方式是0.92.0后提出的一种二级索引的实现方式，但不够完善。特点：

只能针对一个索引定制一套代码，复用性差
使用alter命令，每次必须disable表，对线上服务很不友好。适用于初始建表时使用。
对online服务来讲，该方式是友好的，比其他侵入性强的方式要优异不少。

MapReduce方式的使用2.1 使用源代码创建从code页面下载源代码比给你且使用maven2打包。下载源代码，解压后，进入工程文件夹，执行命令：
mvn install
Note: 需要安装maven >= 2.2.1
2.2 直接使用工程根目录下有已经编译的最新jar： hbase-secondary-index-0.1.jar 可直接使用
2.3 创建索引'src/main/resources'中文件buildindex.sh给出了使用样例： hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i user_behavior_attribute_noregistered -o user_behavior_attribute_noregistered_index -c bhvr:vvmid -s 20130101 -e 20130120 -v 3 -si false

hbase-secondary-index-0.1.jar jar包名字
net.hbase.secondaryindex.mapred.Main mapred方式运行主类
user_behavior_attribute_noregistered 输入表名
user_behavior_attribute_noregistered_index 索引表名
bhvr:vvmid 字段名
20130101 起始时间
20130120 结束时间
3 每个cell最多取3个版本的数据
false 创建联合索引

参数的使用及其含义
usage: Build-Secondary-Index -c [url=]family:qualifier[/url] [-d] [-e ] -i -o [-s ] [-si ] [-v ]
-c,--column [url=]family:qualifier[/url] column to store row data into (must exist)
-d,--debug switch on DEBUG log level
-e,--edate the end date of data to build index(default is today), such as: 20130120
-i,--input the directory or file to read from (must exist)
-o,--output table to import into (must exist)
-s,--sdate the start date of data to build index(default is 19700101), such as: 20130101
-si,--sindex if use single index. true means 'single index', false means 'combined index'(default is true). If build combined index, the max number of columns is 3.
-v,--versions the versions of each cell to build index(default is Integer.MAX_VALUE)

图文精华

创建二级索引的方法

活跃会员

热心会员

推广达人

宣传达人

突出贡献

优秀版主

论坛元老

推荐 /2