idea调试Spark--sparksql

我本地调试想直接从服务器把表的数据查出来，hive是可以的。setMaster我指向了服务器打印日志提示没有表
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)val www = sqlContext.sql("select * from tw_stock_d where date = '20160801' order by date").show
在spark-shell是执行时没有问题的，请问idea项目应该怎么配置？报错日志：16/11/12 06:50:37 INFO DAGScheduler: ResultStage 0 (take at t_stock_d.scala:18) finished in 2.674 s16/11/12 06:50:37 INFO DAGScheduler: Job 0 finished: take at t_stock_d.scala:18, took 2.734438 s16/11/12 06:50:38 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.1.105:49647 in memory (size: 1863.0 B, free: 1697.6 MB)16/11/12 06:50:38 INFO BlockManagerInfo: Removed broadcast_1_piece0 on slave1:48159 in memory (size: 1863.0 B, free: 511.5 MB)Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:130)Caused by: org.apache.spark.sql.AnalysisException: Table not found: tw_stock_d; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:306) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:315) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:310) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)

玉溪 · 发表于 2016-11-12 09:26:41

idea 本地开发调试的时候，应该是必须用 val sc = new SparkContext(conf) val hiveContext = new HiveContext(sc) ，这样通过jdbc就可以执行spark-sql了。
应该是这样吧？你们有什么好的方法？

langke93 · 发表于 2016-11-12 09:54:43

出现这个问题可能是没有找到hive库地址，使用spark-sql时，spark-sql脚本会知道要在Spark安装目录中，去找conf/hive-site.xml; 但是IDEA运行时，就不知道如何去找哪里找hive-site.xml了.

知道spark如何使用hive-site.xml文件，才能知道应该如何将这个文件"送"给IEDA;
在 HiveContext.scala源码中:

[mw_shl_code=scala,true]def newTemporaryConfiguration(): Map[String, String] = {
val tempDir = Utils.createTempDir()
val localMetastore = new File(tempDir, "metastore").getAbsolutePath
val propMap: HashMap[String, String] = HashMap()
// We have to mask all properties in hive-site.xml that relates to metastore data source
// as we used a local metastore here.
HiveConf.ConfVars.values().foreach { confvar =>
   if (confvar.varname.contains("datanucleus") || confvar.varname.contains("jdo")) {
   propMap.put(confvar.varname, confvar.defaultVal)
   }
}  [/mw_shl_code]

Spark SQL是复用了Hive的HiveConf类来读取hive-site.xml中!: HiveConf是通过在Classpath中依次寻找hive-site.xml的。于是在IDEA中显式制定classpath:
File->Project Structure->Modules->Dependencies->Add->Jars or Directories, 选中spark安装目录下的conf目录:

选择Classes,然后将Scope修改为"Runtime";

参考：美伊小公主的奶爸

玉溪 · 发表于 2016-11-12 11:46:24

spark开发的时候不是有两种方式
val sc = new SparkContext(conf)
val hiveContext = new HiveContext(sc) 这种方式是没有问题的。
val sqlContext = new org.apache.spark.sql.SQLContext(sc) 这种方式怎么能直接在idea里执行sql？

langke93 · 发表于 2016-11-12 16:17:24

玉溪发表于 2016-11-12 11:46
spark开发的时候不是有两种方式
val sc = new SparkContext(conf)
val hiveContext = new HiveContext(sc ...

楼主参考这个

[mw_shl_code=scala,true]case class Person(name: String, value: String)

object SparkSQLTest {

  def main(args: Array[String]) {

val conf = new SparkConf()
conf.set("spark.master", "local")
conf.setAppName("JavaWordCount");


val sc = new SparkContext(conf)

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)


import sqlContext.implicits._

sqlContext.sql("select * from ewaplog where value in ('yes', null)").collect().foreach(println)
  }

}  [/mw_shl_code]

langke93 · 发表于 2016-11-12 16:42:54

langke93 发表于 2016-11-12 16:17
楼主参考这个

[mw_shl_code=scala,true]case class Person(name: String, value: String)

更多可参考这个
开发环境中[IDEA]调试Spark SQL及遇到问题解决办法
http://www.aboutyun.com/forum.php?mod=viewthread&tid=20266

玉溪 · 发表于 2016-11-13 03:55:30

本帖最后由玉溪于 2016-11-13 08:04 编辑

langke93 非常感谢你的回答

图文精华

idea调试Spark--sparksql

已有(6)人评论

最佳新人

活跃会员

热心会员

推荐 /2