热度 1
nutch1.4定时爬取数据配合linux定时任务可以实现nutch的自动定时爬取,linux定时任务请参考
linux crontab 配置定时mapreduce job
http://www.aboutyun.com/thread-6289-1-1.html
(出处: about云开发)
步骤如下:
1、首先查看当前用户的 crontab服务执行命令:
crontab -l
执行结果:
no crontab for ***
表示没有定义 crontab 服务
观察运行结果。重启可能不成功,使用如下步骤重新启动:
15:40:34^O^bin$ sudo /etc/init.d/cron stop
[sudo] password for sniffer:
Rather than invoking init scripts through /etc/init.d, use the service(8)
utility, e.g. service cron stop
Since the script you are attempting to invoke has been converted to an
Upstart job, you may also use the stop(8) utility, e.g. stop cron
cron stop/waiting
15:40:49^O^bin$ ps -A | grep cron
15:40:54^O^bin$ sudo /etc/int.d/cron start
sudo: /etc/int.d/cron: command not found
15:41:11^O^bin$ sudo /etc/init.d/cron start
Rather than invoking init scripts through /etc/init.d, use the service(8)
utility, e.g. service cron start
Since the script you are attempting to invoke has been converted to an
Upstart job, you may also use the start(8) utility, e.g. start cron
cron start/running, process 14362
15:41:19^O^bin$ ps -A | grep cron
14362 ? 00:00:00 cron
注:nutch脚本存在无法找到JAVA_HOME的问题可以修改如下部分解决:
if [ "$JAVA_HOME" = "" ]; then
#echo "Error: JAVA_HOME is not set."
#exit 1
JAVA_HOME="***"
fi