分享

Chukwa-基于Hadoop的日志收集框架

本帖最后由 sunshine_junge 于 2014-12-28 19:17 编辑

问题导读:

1.Chukwa如何进行安装配置?
2.Chukwa如何收集日志并处理?






简介

chukwa是解决在集群环境中收集各节点增量日志的一种基于hadoop的实现方案,其主要有如下四个组成部分。

  • Agents 运行在每个客户端上,负责发送数据。
  • Collectors 接收Agents发送的数据并写入稳定存储。
  • MapReduce jobs 分析和归档数据。
  • HICC 数据中心,用于显示数据的web界面。

它的系统架构如下图
1-1.gif

安装

以单机部署为例,前提已经安装hadoop(1.0.4)

部署
  1. tar -zxvf chukwa-incubating-0.5.0.tar.gz -C /usr/local/cloud/src/
  2. cd /usr/local/cloud/
  3. ln -s -f /usr/local/cloud/src/chukwa-incubating-0.5.0 chukwa
复制代码

目录
  1. mkdir /data/logs/chukwa
  2. mkdir /data/pids/chukwa
复制代码




配置

系统配置
  1. export CHUKWA_HOME=/usr/local/cloud/chukwa
  2. export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$CHUKWA_HOME/bin:$PATH
复制代码

代理器配置
代理器地址
  1. localhost
复制代码

代理器参数
  1. <!-- 设置轮询检测文件内容变化的间隔时间  -->
  2. <property>
  3.     <name>chukwaAgent.adaptor.context.switch.time</name>
  4.     <value>5000</value>
  5. </property>
  6. <!-- 设置读取文件增量内容的最大值  -->
  7. <property>
  8.     <name>chukwaAgent.fileTailingAdaptor.maxReadSize</name>
  9.     <value>2097152</value>
  10. </property>
复制代码

收集器配置

收集器地址
  1. # 单机部署的情况下与agents相同
  2. localhost
复制代码


收集器参数
  1. <!-- Chukwa 0.5 版本添加了写入到HBase的实现, 如果不需要则应恢复默认 -->
  2. <!-- Sequence File Writer parameters -->
  3. <property>
  4.     <name>chukwaCollector.pipeline</name>
  5.     <value>org.apache.hadoop.chukwa.datacollection.writer.SocketTeeWriter,org.apache.hadoop.chukwa.datacollection.writer.Se#
  6. </property>
  7. <!-- 设置服务端地址  -->
  8. <property>
  9.     <name>writer.hdfs.filesystem</name>
  10.     <value>hdfs://hadooptest:9000</value>
  11. </property>
复制代码


全局配置
  1. export JAVA_HOME=/usr/java/default
  2. export CLASSPATH=.:$JAVA_HOME/lib
  3. export HADOOP_HOME=/usr/local/cloud/hadoop
  4. export CHUKWA_HOME=/usr/local/cloud/chukua
  5. export CHUKWA_CONF_DIR=${CHUKWA_HOME}/etc/chukwa
  6. export CHUKWA_PID_DIR=/data/pids/chukwa
  7. export CHUKWA_LOG_DIR=/data/logs/chukwa
复制代码


监测文件设置
  1. # 在 $CHUKWA_HOME/etc/chukwa/initial_adaptors 中添加要监测的日志文件, 但一般使用 telnet 链接到服务端的方式添加
  2. # 格式为 add [name =] <adaptor_class_name> <datatype> <adaptor specific params> <initial offset>
  3. # 依次为: 监测接口的实现类 数据类型 起始点 日志文件 已收集的文件大小
  4. add filetailer.CharFileTailingAdaptorUTF8 typeone 0 /data/logs/web/typeone.log 0
  5. add filetailer.CharFileTailingAdaptorUTF8 typetwo 0 /data/logs/web/typetwo.log 0
复制代码


启动服务

启动收集器进程
  1. cd $CHUKWA_HOME/
  2. sbin/start-collectors.sh
复制代码

启动代理器进程
  1. sbin/start-agents.sh
复制代码

启动数据处理进程

  1. sbin/start-data-processors.sh
复制代码


启动后的一些输出如下:
  1. [hadoop@hadooptest chukua]$ sbin/start-collectors.sh
  2. localhost: starting collector, logging to /data/logs/chukwa/chukwa-hadoop-collector-hadooptest.out
  3. localhost: WARN: option chukwa.data.dir may not exist; val = /chukwa
  4. localhost: Guesses:
  5. localhost:  chukwaRootDir null
  6. localhost:  fs.default.name URI
  7. localhost:  nullWriter.dataRate Time
  8. localhost: WARN: option chukwa.tmp.data.dir may not exist; val = /chukwa/temp
  9. localhost: Guesses:
  10. localhost:  chukwaRootDir null
  11. localhost:  nullWriter.dataRate Time
  12. localhost:  chukwaCollector.tee.port Integral
  13. [hadoop@hadooptest chukua]$ sbin/start-agents.sh
  14. localhost: starting agent, logging to /data/logs/chukwa/chukwa-hadoop-agent-hadooptest.out
  15. localhost: OK chukwaAgent.adaptor.context.switch.time [Time] = 5000
  16. localhost: OK chukwaAgent.checkpoint.dir [File] = /data/logs/chukwa/
  17. localhost: OK chukwaAgent.checkpoint.interval [Time] = 5000
  18. localhost: WARN: option chukwaAgent.collector.retries may not exist; val = 144000
  19. localhost: Guesses:
  20. localhost:  chukwaAgent.connector.retryRate Time
  21. localhost:  chukwaAgent.sender.retries Integral
  22. localhost:  chukwaAgent.control.remote Boolean
  23. localhost: WARN: option chukwaAgent.collector.retryInterval may not exist; val = 20000
  24. localhost: Guesses:
  25. [hadoop@hadooptest chukua]$ sbin/start-data-processors.sh
  26. starting archive, logging to /data/logs/chukwa/chukwa-hadoop-archive-hadooptest.out
  27. starting demux, logging to /data/logs/chukwa/chukwa-hadoop-demux-hadooptest.out
  28. starting dp, logging to /data/logs/chukwa/chukwa-hadoop-dp-hadooptest.out
  29. [hadoop@hadooptest chukua]$
复制代码


收集测试

构造测试数据

  1. # 在 /data/logs/web/webone 中写入如下测试日志
  2. - 10.0.0.10 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa0.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  3. - 10.0.0.11 [17/Oct/2011:23:20:41 +0800] GET /img/chukwa1.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  4. - 10.0.0.12 [17/Oct/2011:23:20:42 +0800] GET /img/chukwa2.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  5. - 10.0.0.13 [17/Oct/2011:23:20:43 +0800] GET /img/chukwa3.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  6. - 10.0.0.14 [17/Oct/2011:23:20:44 +0800] GET /img/chukwa4.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  7. - 10.0.0.15 [17/Oct/2011:23:20:45 +0800] GET /img/chukwa5.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  8. - 10.0.0.16 [17/Oct/2011:23:20:46 +0800] GET /img/chukwa6.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  9. - 10.0.0.17 [17/Oct/2011:23:20:47 +0800] GET /img/chukwa7.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  10. - 10.0.0.18 [17/Oct/2011:23:20:48 +0800] GET /img/chukwa8.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  11. - 10.0.0.19 [17/Oct/2011:23:20:49 +0800] GET /img/chukwa9.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  12. # 在 /data/logs/web/webtwo 中写入如下测试日志
  13. - 192.168.0.10 [17/Oct/2011:23:20:40 +0800] GET /img/chukwa0.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  14. - 192.168.0.11 [17/Oct/2011:23:21:40 +0800] GET /img/chukwa1.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  15. - 192.168.0.12 [17/Oct/2011:23:22:40 +0800] GET /img/chukwa2.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  16. - 192.168.0.13 [17/Oct/2011:23:23:40 +0800] GET /img/chukwa3.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  17. - 192.168.0.14 [17/Oct/2011:23:24:40 +0800] GET /img/chukwa4.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  18. - 192.168.0.15 [17/Oct/2011:23:25:40 +0800] GET /img/chukwa5.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  19. - 192.168.0.16 [17/Oct/2011:23:26:40 +0800] GET /img/chukwa6.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  20. - 192.168.0.17 [17/Oct/2011:23:27:40 +0800] GET /img/chukwa7.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  21. - 192.168.0.18 [17/Oct/2011:23:28:40 +0800] GET /img/chukwa8.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
  22. - 192.168.0.19 [17/Oct/2011:23:29:40 +0800] GET /img/chukwa9.jpg HTTP/1.0 "404" "16" "Mozilla/5.0 (MSIE 9.0; Windows NT 6.1;)"
复制代码

模拟WEB日志
  1. # 在 /data/logs/web/weblogadd.sh 中写入如下内容
  2. #!/bin/bash
  3. cat /data/logs/web/webone >> /data/logs/web/typeone.log
  4. cat /data/logs/web/webtwo >> /data/logs/web/typetwo.log
  5. # 设置脚本文件可执行
  6. chmod +x weblogadd.sh
  7. # 在 /etc/crontab 中添加定时任务以模拟WEB日志生成
  8. */1 * * * * hadoop /data/logs/web/weblogadd.sh
复制代码


添加日志监控
  1. # 链接到服务端的 telnet 服务
  2. telnet hadooptest 9093
  3. add org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8 typeone 0 /data/logs/web/typeone.log 0
  4. add org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8 typetwo 0 /data/logs/web/typetwo.log 0
复制代码


处理流程


目录结构
  1. /chukwa/
  2.     archivesProcessing/
  3.     dataSinkArchives/
  4.     demuxProcessing/
  5.     finalArchives/
  6.     logs/
  7.     postProcess/
  8.     repos/
  9.     rolling/
  10.     temp/
复制代码


流程图
1-2.png












欢迎加入about云群90371779322273151432264021 ,云计算爱好者群,亦可关注about云腾讯认证空间||关注本站微信

没找到任何评论,期待你打破沉寂

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条