分享

hadoop2获取job执行进度

本帖最后由 pig2 于 2014-6-5 16:30 编辑
hadoop2.2 怎样查看job的执行进度?
有没有相关的开发接口?

已有(4)人评论

跳转到指定楼层
hyj 发表于 2014-6-5 11:46:34

你可以参考下面内容,详细Hadoop源码以及流程解析
JobInProgress
JobClient提交job后,JobTracker会创建一个JobInProgress来跟踪和调度这个job,并把它添加到job队列里。JobInProgress会根据提交的job jar中定义的输入数据集(已分解成FileSplit)创建对应的一批TaskInProgress用于监控和调度MapTask,同时在创建指定数目的TaskInProgress用于监控和调度ReduceTask,缺省为1个ReduceTask。
TaskInProgress
每个TaskInProgress就代表一个map或一个reduce处理,它将Job里面的Map和Reduce操作进行了封装,不过在Master端生成的TaskInProgress只是初始化了信息但并不调用执行方法。它等待TaskTracker端的RPC调用。
看了上面,你只能有一个大体的认识,详细你需要查看源代码,接口肯定是有的,你必须熟悉源代码,才有开发的可能。
想看源码,可以参考下面的帖子

hadoop源码分析汇总及文档下载

回复

使用道具 举报

pig2 发表于 2014-6-5 12:52:00

下面是hadoop各种命令的shell,同时也是调用的Java的 jar包,看下面红字部分:
elif [ "$COMMAND" = "jar" ] ; then
CLASS=org.apache.hadoop.util.RunJar
也就是说,如果你想获取hadoop的job进度,需要研究一下这个包org.apache.hadoop.util.RunJar,接口应该都在里面




bin=`which $0`
bin=`dirname ${bin}`
bin=`cd "$bin"; pwd`

DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hadoop-config.sh

function print_usage(){
echo "Usage: hadoop [--config confdir] COMMAND"
echo " where COMMAND is one of:"
echo " fs run a generic filesystem user client"
echo " version print the version"
echo " jar <jar> run a jar file"
echo " checknative [-a|-h] check native hadoop and compression libraries availability"
echo " distcp <srcurl> <desturl> copy file or directories recursively"
echo " archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive"
echo " classpath prints the class path needed to get the"
echo " Hadoop jar and the required libraries"
echo " daemonlog get/set the log level for each daemon"
echo " or"
echo " CLASSNAME run the class named CLASSNAME"
echo ""
echo "Most commands print help when invoked w/o parameters."
}

if [ $# = 0 ]; then
print_usage
exit
fi

COMMAND=$1
case $COMMAND in
# usage flags
--help|-help|-h)
print_usage
exit
;;

#hdfs commands
namenode|secondarynamenode|datanode|dfs|dfsadmin|fsck|balancer|fetchdt|oiv|dfsgroups|portmap|nfs3)
echo "DEPRECATED: Use of this script to execute hdfs command is deprecated." 1>&2
echo "Instead use the hdfs command for it." 1>&2
echo "" 1>&2
#try to locate hdfs and if present, delegate to it.
shift
if [ -f "${HADOOP_HDFS_HOME}"/bin/hdfs ]; then
exec "${HADOOP_HDFS_HOME}"/bin/hdfs ${COMMAND/dfsgroups/groups} "$@"
elif [ -f "${HADOOP_PREFIX}"/bin/hdfs ]; then
exec "${HADOOP_PREFIX}"/bin/hdfs ${COMMAND/dfsgroups/groups} "$@"
else
echo "HADOOP_HDFS_HOME not found!"
exit 1
fi
;;

#mapred commands for backwards compatibility
pipes|job|queue|mrgroups|mradmin|jobtracker|tasktracker)
echo "DEPRECATED: Use of this script to execute mapred command is deprecated." 1>&2
echo "Instead use the mapred command for it." 1>&2
echo "" 1>&2
#try to locate mapred and if present, delegate to it.
shift
if [ -f "${HADOOP_MAPRED_HOME}"/bin/mapred ]; then
exec "${HADOOP_MAPRED_HOME}"/bin/mapred ${COMMAND/mrgroups/groups} "$@"
elif [ -f "${HADOOP_PREFIX}"/bin/mapred ]; then
exec "${HADOOP_PREFIX}"/bin/mapred ${COMMAND/mrgroups/groups} "$@"
else
echo "HADOOP_MAPRED_HOME not found!"
exit 1
fi
;;

classpath)
echo $CLASSPATH
exit
;;

#core commands
*)
# the core commands
if [ "$COMMAND" = "fs" ] ; then
CLASS=org.apache.hadoop.fs.FsShell
elif [ "$COMMAND" = "version" ] ; then
CLASS=org.apache.hadoop.util.VersionInfo
elif [ "$COMMAND" = "jar" ] ; then
CLASS=org.apache.hadoop.util.RunJar

elif [ "$COMMAND" = "checknative" ] ; then
CLASS=org.apache.hadoop.util.NativeLibraryChecker
elif [ "$COMMAND" = "distcp" ] ; then
CLASS=org.apache.hadoop.tools.DistCp
CLASSPATH=${CLASSPATH}:${TOOL_PATH}
elif [ "$COMMAND" = "daemonlog" ] ; then
CLASS=org.apache.hadoop.log.LogLevel
elif [ "$COMMAND" = "archive" ] ; then
CLASS=org.apache.hadoop.tools.HadoopArchives
CLASSPATH=${CLASSPATH}:${TOOL_PATH}
elif [[ "$COMMAND" = -* ]] ; then
# class and package names cannot begin with a -
echo "Error: No command named \`$COMMAND' was found. Perhaps you meant \`hadoop ${COMMAND#-}'"
exit 1
else
CLASS=$COMMAND
fi
shift

# Always respect HADOOP_OPTS and HADOOP_CLIENT_OPTS
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"

#make sure security appender is turned off
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,NullAppender}"

export CLASSPATH=$CLASSPATH
exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
;;

esac



回复

使用道具 举报

pig2 发表于 2014-6-24 00:21:12
可以参考另外一个帖子【急、求助】Hadoop Job执行进度的百分比


  1. org.apache.hadoop.mapreduce.Job.mapProgress()
  2. org.apache.hadoop.mapreduce.Job.reduceProgress()
  3. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html
复制代码



回复

使用道具 举报

sstutu 发表于 2014-6-28 07:31:27
job分为map与reduce。这两个hadoop都有提供的相关api:


progress.png

回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条