问题: 在CDH集群中用sql语句往hive中hbase的映射表中插入数据时报错如下:
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.HiveOutputFormat
at org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat$lzycompute(hiveWriterContainers.scala:74)
at org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat(hiveWriterContainers.scala:73)
at org.apache.spark.sql.hive.SparkHiveWriterContainer.getOutputName(hiveWriterContainers.scala:93)
at org.apache.spark.sql.hive.SparkHiveWriterContainer.initWriters(hiveWriterContainers.scala:119)
at org.apache.spark.sql.hive.SparkHiveWriterContainer.executorSideSetup(hiveWriterContainers.scala:86)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:102)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:84)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:84)
思路:
看似是类型转换问题,但需要考虑清楚其内部机制。
1.用sparksql执行的话出现Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to
org.apache.hadoop.hive.ql.io.HiveOutputFormat应该是spark的jar包有问题,需要重新编译对应版本的jar包,
2.用hive执行sql的话应该是hbase内部压缩机制的问题,压缩分为主要压缩和次要压缩,需要一定的store空间,store如果给的太小也会报错,可选择调大store参数。
解决:在CDH界面,修改hbase配置中 HBase HRegion 最大化压缩:hbase.hregion.majorcompaction,将7天修改为0.
(虽然能解决以上问题,但是会报警告,不过目前运行并未发现新的问题。)
hbase参数调优参考: