问题导读
1.为什么定制Writable类?
2.如何定制一个Writable类?
Hadoop中有一套Writable实现可以满足大部分需求,但是在有些情况下,我们需要根据自己的需要构造一个新的实现,有了定制的Writable,我们就可以完全控制二进制表示和排序顺序。
为了演示如何新建一个定制的writable类型,我们需要写一个表示一对字符串的实现:
- blic class TextPair implements WritableComparable<TextPair> {
- private Text first;
- private Text second;
-
- public TextPair() {
- set(new Text(), new Text());
- }
-
- public TextPair(String first, String second) {
- set(new Text(first), new Text(second));
- }
-
- public TextPair(Text first, Text second) {
- set(first, second);
- }
-
- public void set(Text first, Text second) {
- this.first = first;
- this.second = second;
- }
-
- public Text getFirst() {
- return first;
- }
-
- public Text getScond() {
- return second;
- }
-
- public void write(DataOutput out) throws IOException {
- first.write(out);
- second.write(out);
- }
-
- public void readFields(DataInput in) throws IOException {
- first.readFields(in);
- second.readFields(in);
- }
-
- public int hashCode() {
- return first.hashCode() * 163 + second.hashCode();
- }
-
- public boolean equals(Object o) {
- if(o instanceof TextPair) {
- TextPair tp = (TextPair)o;
- return first.equals(tp.first) && second.equals(tp.second);
- }
- return false;
- }
-
- public String toString() {
- return first + "\t" + second;
- }
-
- public int compareTo(TextPair tp) {
- int cmp = first.compareTo(tp.first);
- if(cmp != 0) {
- return cmp;
- }
- return second.compareTo(tp.second);
- }
- }
复制代码
为速度实现一个RawComparator
还可以进一步的优化,当作为MapReduce里的key,需要进行比较时,因为他已经被序列化,想要比较他们,那么首先要先反序列化成一个对象,然后再调用compareTo对象进行比较,但是这样效率太低了,有没有可能可以直接比较序列化后的结果呢,答案是肯定的,可以。
我们只需要把EmploeeWritable的序列化后的结果拆成成员对象,然后比较成员对象即可:
- class Comparator extends WritableComparator {
- private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();
- public Comparator() {
- super(TextPair.class);
- }
- public int compara(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
- try {
- int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
- int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
- int cmp = TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
- if(cmp != 0) {
- return cmp;
- }
- return TEXT_COMPARATOR.compare(b1, s1 + firstL1, l1 - firstL1, b2, s2 + firstL2, l2 - firstL2);
- } catch(IOException e) {
- throw new IllegalArgumentException(e);
- }
- }
- }
复制代码
定制comparators有时候,除了默认的comparator,你可能还需要一些自定义的comparator来生成不同的排序队列,看一下下面这个示例:
- public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
- try {
- int firstL1 = WritableUtils.decodeVIntSize(b1[s1])+ readVInt(b1, s1);
- int firstL2 = WritableUtils.decodeVIntSize(b2[s2])+ readVInt(b2, s2);
- return TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
- } catch (IOException e) {
- throw new IllegalArgumentException(e);
- }
- }
-
- public int compare(WritableComparable a, WritableComparable b) {
- if(a instanceof Textpair && b instanceof TextPair) {
- return ((TextPair) a).first.compareTo(((TextPair) b).first);
- }
- return super.compare(a, b);
- }
复制代码
作者:archimedes
出处:http://www.cnblogs.com/archimedes/
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利.
|