分享

Nutch1.4无法继续执行爬取任务

RnD_Alex 发表于 2013-10-25 10:42:52 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 0 4429
[root@m141 deploy]# bin/nutch crawl hdfs://192.168.19.141:9000/user/root/urls -dir crawl -depth 200 -threads 20 -topN 100
Warning: $HADOOP_HOME is deprecated.
12/04/11 19:29:32 WARN crawl.Crawl: solrUrl is not set, indexing will be skipped...
12/04/11 19:29:32 INFO crawl.Crawl: crawl started in: crawl
12/04/11 19:29:32 INFO crawl.Crawl: rootUrlDir = hdfs://192.168.19.141:9000/user/root/urls
12/04/11 19:29:32 INFO crawl.Crawl: threads = 20
12/04/11 19:29:32 INFO crawl.Crawl: depth = 200
12/04/11 19:29:32 INFO crawl.Crawl: solrUrl=null
12/04/11 19:29:32 INFO crawl.Crawl: topN = 100
12/04/11 19:29:32 INFO crawl.Injector: Injector: starting at 2012-04-11 19:29:32
12/04/11 19:29:32 INFO crawl.Injector: Injector: crawlDb: crawl/crawldb
12/04/11 19:29:32 INFO crawl.Injector: Injector: urlDir: hdfs://192.168.19.141:9000/user/root/urls
12/04/11 19:29:32 INFO crawl.Injector: Injector: Converting injected urls to crawl db entries.
执行到根据注入的列表生成待下载的地址库时,无法继续爬取信息,也未生成文件夹crawl

没找到任何评论,期待你打破沉寂

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条