分享

nutch2.0单机爬取,报错connection failure

sunny6142496 发表于 2013-10-16 13:41:02 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 6 6943
sunny@sunny-HP-Compaq-dx7408-Microtower:~/apache-nutch-2.0-src/runtime/local$ bin/nutch crawl urls -dir crawl -depth 3 -topN 5
Exception in thread "main" org.apache.gora.util.GoraException: java.io.IOException: java.sql.SQLTransientConnectionException: connection exception: connection failure: java.io.EOFException
        at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
        at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
        at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:69)
        at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243)
        at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
        at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
        at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
Caused by: java.io.IOException: java.sql.SQLTransientConnectionException: connection exception: connection failure: java.io.EOFException
        at org.apache.gora.sql.store.SqlStore.getConnection(SqlStore.java:747)
        at org.apache.gora.sql.store.SqlStore.initialize(SqlStore.java:160)
        at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
        at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
        ... 8 more
Caused by: java.sql.SQLTransientConnectionException: connection exception: connection failure: java.io.EOFException
        at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
        at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
        at org.hsqldb.jdbc.JDBCConnection.[i](Unknown Source)
        at org.hsqldb.jdbc.JDBCDriver.getConnection(Unknown Source)
        at org.hsqldb.jdbc.JDBCDriver.connect(Unknown Source)
        at java.sql.DriverManager.getConnection(DriverManager.java:582)
        at java.sql.DriverManager.getConnection(DriverManager.java:185)
        at org.apache.gora.sql.store.SqlStore.getConnection(SqlStore.java:739)
        ... 11 more
Caused by: org.hsqldb.HsqlException: connection exception: connection failure: java.io.EOFException
        at org.hsqldb.error.Error.error(Unknown Source)
        at org.hsqldb.error.Error.error(Unknown Source)
        at org.hsqldb.ClientConnection.execute(Unknown Source)
        at org.hsqldb.ClientConnection.[i](Unknown Source)
        ... 17 more
》》》》》》》》》我的nutch配置如下
nutch-site.xml
     
         http.agent.name
         My Nutch Spider
     
regex-urlfilter.txt加了
# accept anything else
+^http://([a-z0-9]*\.)*apache.org/
              
               
               

已有(6)人评论

跳转到指定楼层
amuseme_lu 发表于 2013-10-16 13:41:40

            connection exception: connection failure
阿是数据库没有连上,看一下配置,再看一下数据库有没有打开。
        
回复

使用道具 举报

amuseme_lu 发表于 2013-10-16 13:42:11

            connection exception: connection failure
阿是数据库没有连上,看一下配置,再看一下数据库有没有打开。
        
回复

使用道具 举报

sunny6142496 发表于 2013-10-16 13:42:55

            原来如此,2.0是一定要连接数据库才能用的,明白了
        
回复

使用道具 举报

sunny6142496 发表于 2013-10-16 13:43:44

            引用 2 楼  的回复:connection exception: connection failure
阿是数据库没有连上,看一下配置,再看一下数据库有没有打开。

看了你博客的文章,nutch分析的真好,对我帮助很大,我还有一个问题:
nutch2.0可以存储一个url的多个版本吗,还是只存最新的
我看到gora-hbase-mapping.xml有这样的定义

不知道maxVersions是不是指版本数,是不是改了这个设置就可以存多版本了
        
回复

使用道具 举报

sunny6142496 发表于 2013-10-16 13:44:38

            sunny@sunny-HP-Compaq-dx7408-Microtower:~/nutch-2.0/runtime/deploy$ bin/nutch inject urls
12/09/06 14:48:48 INFO crawl.InjectorJob: InjectorJob: starting
12/09/06 14:48:48 INFO crawl.InjectorJob: InjectorJob: urlDir: urls
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/zookeeper/KeeperException
        at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183)
        at org.apache.hadoop.hbase.client.HBaseAdmin.[i](HBaseAdmin.java:99)
        at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:110)
        at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
        at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
        at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
        at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:69)
        at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243)
        at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:268)
        at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:288)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:298)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: org.apache.zookeeper.KeeperException
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        ... 17 more
这是为什么呢
        
回复

使用道具 举报

hb308102796 发表于 2013-10-16 13:45:16

            我也遇到这个问题,怎么解决 的??
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条