分享

Shark本地安装及可能出现的问题


问题导读:
1.本地安装,shark如何配置?
2.一些常见错误解决方案?












1.下载scala

wget http://www.scala-lang.org/files/archive/scala-2.9.3.tgz
最新有2.10.2.tgz文件
tar xvfz scala-2.9.3.tgz


2.下载shark and hive压缩包
wget http://spark-project.org/download/shark-0.7.0-hadoop1-bin.tgz (cdh3)
tar xvfz shark-0.7.0-*-bin.tgz


3. 配置环境变量

  1. cd shark-0.7.0/conf
  2. cp shark-env.sh.template shark-env.sh
  3. vi shark-env.sh
  4. export HIVE_HOME=/path/to/hive-0.9.0-bin
  5. export SCALA_HOME=/path/to/scala-2.9.3
复制代码





4.测试数据

  1. CREATE TABLE src(key INT, value STRING);
  2. LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src;
  3. SELECT COUNT(1) FROM src;
  4. OK
  5. 500
  6. Time taken: 2.149 seconds
  7. 没有了hive中的mr,速度快了不少
  8. CREATE TABLE src_cached AS SELECT * FROM SRC;
  9. SELECT COUNT(1) FROM src_cached;
复制代码




安装过程中可能出现的问题及解决


1.CREATE TABLE src(key INT, value STRING);

  1. FAILED: Error in metadata: MetaException(message:Got exception: org.apache.hadoop.ipc.RPC$VersionMismatch Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version
  2. mismatch. (client = 61, server = 63))
  3. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
  4. ERROR exec.Task: FAILED: Error in metadata: MetaException(message:Got exception: org.apache.hadoop.ipc.RPC$VersionMismatch Protocol
  5. org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63))
  6. org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: org.apache.hadoop.ipc.RPC$VersionMismatch Protocol
  7. org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63))
  8.         at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:544)
  9.         at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3313)
  10.         at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:242)
  11.         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)
  12.         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
  13.         at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1312)
  14.         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1104)
  15.         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:937)
  16.         at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:288)
  17.         at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
  18.         at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
  19.         at shark.SharkCliDriver$.main(SharkCliDriver.scala:203)
  20.         at shark.SharkCliDriver.main(SharkCliDriver.scala)
复制代码


reason:Hadoop版本与SHARK的Hadoop core jar包版本不一致引起的。
解决:将${HADOOP_HOME}/hadoop-core-*.jar copy 到${SHARK_HOME}/lib_managed/jars/org.apache.hadoop/hadoop-core/目录下面,rm原来的hadoop-core-*.jar

重新进入Shark

2.出现java.lang.NoClassDefFoundError

/app/hadoop/shark/shark-0.7.0/lib_managed/jars/org.apache.hadoop/hadoop-core/

  1. java.lang.NoClassDefFoundError: org/apache/hadoop/thirdparty/guava/common/collect/LinkedListMultimap
  2.         at org.apache.hadoop.hdfs.SocketCache.<init>(SocketCache.java:48)
  3.         at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:253)
  4.         at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:220)
  5.         at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
  6.         at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1611)
  7.         at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:68)
  8.         at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1645)
  9.         at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1627)
  10.         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
  11.         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
  12.         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:238)
  13.         at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183)
  14.         at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104)
  15.         at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:136)
  16.         at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:151)
  17.         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getDefaultDatabasePath(HiveMetaStore.java:475)
  18.         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:353)
  19.         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:371)
  20.         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:278)
  21.         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:248)
  22.         at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:114)
  23.         at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)
  24.         at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)
  25.         at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:538)
  26.         at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3313)
  27.         at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:242)
  28.         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)
  29.         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
  30.         at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1312)
  31.         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1104)
  32.         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:937)
  33.         at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:288)
  34.         at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
  35.         at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
  36.         at shark.SharkCliDriver$.main(SharkCliDriver.scala:203)
  37.         at shark.SharkCliDriver.main(SharkCliDriver.scala)
  38. Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.thirdparty.guava.common.collect.LinkedListMultimap
  39.         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
  40.         at java.security.AccessController.doPrivileged(Native Method)
  41.         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
  42.         at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
  43.         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
  44.         at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
  45.         ... 36 more
复制代码


reason:CDH版本的缺少一个第三方包guava-*.jar

解决:建一个目录${SHARK_HOME}/lib_managed/jars/org.apache.hadoop/thirdparty,拷贝${HADOOP_HOME}/lib/guava-r09-jarjar.jar到这个目录


重新进入Shark


3.show tables出现问题
  1. Failed with exception java.io.IOException:java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in
复制代码



mapredWork!

reason:缺少hadoop-lzo-*.jar引起的

解决:建一个目录${SHARK_HOME}/lib_managed/jars/org.apache.hadoop/lib, 拷贝${HADOOP_HOME}/lib/hadoop-lzo-*.jar到这个目录

重新进入Shark



4.SELECT count(1) FROM src_cached出现问题

  1. spark.SparkException: Job failed: ShuffleMapTask(6, 0) failed: ExceptionFailure(java.lang.NoSuchMethodError: sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)
  2. V)at spark.scheduler.DAGScheduler$anonfun$abortStage$1.apply(DAGScheduler.scala:642)
  3.         at spark.scheduler.DAGScheduler$anonfun$abortStage$1.apply(DAGScheduler.scala:640)
  4.         at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
  5.         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
  6.         at spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:640)
  7.         at spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:601)
  8.         at spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:300)
  9.         at spark.scheduler.DAGScheduler.spark$scheduler$DAGScheduler$run(DAGScheduler.scala:364)
  10.         at spark.scheduler.DAGScheduler$anon$1.run(DAGScheduler.scala:107)
  11. FAILED: Execution Error, return code -101 from shark.execution.SparkTask
复制代码



reason:java1.6版本低,需要安装jdk7.
解决:安装jdk7, JAVA_HOME指向新的JDK7,问题解决


  1. tar xvfz jdk-7u25-linux-x64.tar.gz -C /usr/java/
  2. export JAVA_HOME=/usr/java/jdk1.7.0_25
  3. export CLASSPATH=/usr/java/jdk1.7.0_25/lib
复制代码

重新进入Shark






引用:http://blog.csdn.net/johnny_lee/article/details/18364473


欢迎加入about云群90371779322273151432264021 ,云计算爱好者群,亦可关注about云腾讯认证空间||关注本站微信

没找到任何评论,期待你打破沉寂

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条