分享

hadoop-2.2+hbase-0.96集群+nutch1.8 fetch报错

Jeelon 2016-5-29 11:19:54 发表于 异常错误 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 2 12372
各位师兄,请帮忙看看这个错误到底咋回事困扰我许久了!

hadoop集群正常启动
运行如下命令
[root@CentOS641 deploy]# bin/crawl /nutch_workspace/urls/urls /nutch_workspace/data  http://192.168.159.120:8983/solr 10
后执行,到fetch开始报错,日志如下:
16/05/28 20:15:52 WARN fetcher.Fetcher: Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
16/05/28 20:15:52 INFO fetcher.Fetcher: Fetcher: starting at 2016-05-28 20:15:52
16/05/28 20:15:52 INFO fetcher.Fetcher: Fetcher: segment: /nutch_workspace/data/segments
16/05/28 20:15:52 INFO fetcher.Fetcher: Fetcher Timelimit set for : 1464502552307
16/05/28 20:15:56 INFO client.RMProxy: Connecting to ResourceManager at CentOS641/192.168.159.120:8032
16/05/28 20:15:57 INFO client.RMProxy: Connecting to ResourceManager at CentOS641/192.168.159.120:8032
16/05/28 20:15:59 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/root/.staging/job_1464324668489_0044
16/05/28 20:15:59 ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://CentOS641:8020/nutch_workspace/data/segments/crawl_generate
16/05/28 20:15:59 ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://CentOS641:8020/nutch_workspace/data/segments/crawl_generate
16/05/28 20:15:59 ERROR fetcher.Fetcher: Fetcher: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://CentOS641:8020/nutch_workspace/data/segments/crawl_generate
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)
        at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
        at org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:106)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:518)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1340)
        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1376)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1349)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

刚开始看到这个错误以为是路径真的不存在,于是执行
/usr/hadoop/hadoop/bin/hadoop fs -lsr /nutch_workspace/
返回:
[root@CentOS641 ~]# /usr/hadoop/hadoop-2.2.0/bin/hadoop fs -lsr /nutch_workspace/
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxrwxrwx - root supergroup 0 2016-05-27 17:10 /nutch_workspace/data
drwxrwxrwx - root supergroup 0 2016-05-28 20:15 /nutch_workspace/data/crawldb
drwxr-xr-x - root supergroup 0 2016-05-28 20:13 /nutch_workspace/data/crawldb/current
drwxr-xr-x - root supergroup 0 2016-05-28 20:13 /nutch_workspace/data/crawldb/current/part-00000
-rw-r--r-- 1 root supergroup 145 2016-05-28 20:13 /nutch_workspace/data/crawldb/current/part-00000/data
-rw-r--r-- 1 root supergroup 214 2016-05-28 20:13 /nutch_workspace/data/crawldb/current/part-00000/index
drwxrwxrwx - root supergroup 0 2016-05-28 20:00 /nutch_workspace/data/crawldb/old
drwxrwxrwx - root supergroup 0 2016-05-28 20:00 /nutch_workspace/data/crawldb/old/part-00000
-rwxrwxrwx 1 root supergroup 145 2016-05-28 20:00 /nutch_workspace/data/crawldb/old/part-00000/data
-rwxrwxrwx 1 root supergroup 214 2016-05-28 20:00 /nutch_workspace/data/crawldb/old/part-00000/index
drwxrwxrwx - root supergroup 0 2016-05-28 20:15 /nutch_workspace/data/segments
drwxrwxrwx - root supergroup 0 2016-05-27 17:10 /nutch_workspace/data/segments/20160527170951
drwxrwxrwx - root supergroup 0 2016-05-27 17:10 /nutch_workspace/data/segments/20160527170951/crawl_generate
-rwxrwxrwx 1 root supergroup 166 2016-05-27 17:10 /nutch_workspace/data/segments/20160527170951/crawl_generate/part-00000
drwxrwxrwx - root supergroup 0 2016-05-28 08:54 /nutch_workspace/data/segments/20160528085433
drwxrwxrwx - root supergroup 0 2016-05-28 08:55 /nutch_workspace/data/segments/20160528085433/crawl_generate
-rwxrwxrwx 1 root supergroup 166 2016-05-28 08:55 /nutch_workspace/data/segments/20160528085433/crawl_generate/part-00000
drwxrwxrwx - root supergroup 0 2016-05-28 19:39 /nutch_workspace/data/segments/20160528193918
drwxrwxrwx - root supergroup 0 2016-05-28 19:40 /nutch_workspace/data/segments/20160528193918/crawl_generate
-rwxrwxrwx 1 root supergroup 166 2016-05-28 19:40 /nutch_workspace/data/segments/20160528193918/crawl_generate/part-00000
drwxr-xr-x - root supergroup 0 2016-05-28 20:02 /nutch_workspace/data/segments/20160528200204
drwxr-xr-x - root supergroup 0 2016-05-28 20:02 /nutch_workspace/data/segments/20160528200204/crawl_generate
-rw-r--r-- 1 root supergroup 166 2016-05-28 20:02 /nutch_workspace/data/segments/20160528200204/crawl_generate/part-00000
drwxr-xr-x - root supergroup 0 2016-05-28 20:15 /nutch_workspace/data/segments/20160528201454
drwxr-xr-x - root supergroup 0 2016-05-28 20:15 /nutch_workspace/data/segments/20160528201454/crawl_generate
-rw-r--r-- 1 root supergroup 166 2016-05-28 20:15 /nutch_workspace/data/segments/20160528201454/crawl_generate/part-00000
drwxrwxrwx - root supergroup 0 2016-05-27 17:06 /nutch_workspace/urls
-rwxrwxrwx 1 root supergroup 22 2016-05-27 17:06 /nutch_workspace/urls/urls

然后又怀疑是hbase的原因,于是停掉hbase依然报错。

请各位师兄帮忙解答啊,在网上好了根本找不到这个资料,我试过很多版本的集群终没成功,苦恼之极!这个写版版本问题真让人抓狂。。。





已有(2)人评论

跳转到指定楼层
qcbb001 发表于 2016-5-29 14:02:04


这个路径确实不存在
hdfs://CentOS641:8020/nutch_workspace/data/segments/crawl_generate
回复

使用道具 举报

Jeelon 发表于 2016-5-30 09:03:53
qcbb001 发表于 2016-5-29 14:02
这个路径确实不存在
hdfs://CentOS641:8020/nutch_workspace/data/segments/crawl_generate

关键是这个 路径不是我手动写的,在fetch的过程中他自己内部获取的,难道这个是版本造成的么?
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条