分享

Cloudera Manager和Managed Service的数据库及监控数据的存储

howtodown 发表于 2015-1-6 13:37:16 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 0 49889
本帖最后由 howtodown 于 2015-1-6 13:37 编辑
问题导读

1.怎样配置外部表?
2.Cloudera Manager5为什么使用levelDB数据库?
3.Host Monitor 默认数据存储在什么位置?







背景


从业务发展需求,大数据平台需要使用spark作为机器学习、数据挖掘、实时计算等工作,所以决定使用Cloudera Manager5.2.0版本和CDH5。
以前搭建过Cloudera Manager4.8.2和CDH4,在搭建Cloudera Manager5.2.0版本的时候,发现相应的Service Host Monitor 和 Service Monitor不能配置外部表,刚开是还以为是配置出错,后来才发现应该是新版本的Cloudera的存储改变方式了。查了很多文档,果然发现,新版本中Service Host Monitor 和 ServicMonitore 不需要配置数据库,默认使用内置存储方式,并且不能修改。

概述
Cloudera Manager uses databases to store information about the Cloudera Manager configuration, as well as information such as the health of the system or task progress. For quick, simple installations, Cloudera Manager can install and configure an embedded PostgreSQL database as part of the Cloudera Manager installation process. In addition, some CDH services use databases and are automatically configured to use a default database. If you plan to use the embedded and default databases provided during the Cloudera Manager installation, see Installation Path A - Automated Installation by Cloudera Manager.

Although the embedded database is useful for getting started quickly, you can also use your own PostgreSQL, MySQL, or Oracle database for the Cloudera Manager Server and services that use databases.

需要的数据库

The Cloudera Manager Server, Activity Monitor, Reports Manager, Hive Metastore, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server all require databases. The type of data contained in the databases and their estimated sizes are as follows:
  • Cloudera Manager - Contains all the information about services you have configured and their role assignments, all configuration history, commands, users, and running processes. This relatively small database (<100 MB) is the most important to back up.
  • Activity Monitor - Contains information about past activities. In large clusters, this database can grow large. Configuring an Activity Monitor database is only necessary if a MapReduce service is deployed.
  • Reports Manager - Tracks disk utilization and processing activities over time. Medium-sized.
  • Hive Metastore - Contains Hive metadata. Relatively small.
  • Sentry Server - Contains authorization metadata. Relatively small.
  • Cloudera Navigator Audit Server - Contains auditing information. In large clusters, this database can grow large.
  • Cloudera Navigator Metadata Server - Contains authorization, policies, and audit report metadata. Relatively small.



The Cloudera Manager Service Host Monitor and Service Monitor roles have an internal datastore. (注意,就是此处说明了, Host Monitor and Service Monitor在CM5版本中,不能配置外部表,只能使用内置表。与CM4版本有区别)


Cloudera Manager 提供三种不同的安装方式,方法A是自动化安装,方法B和C是使用rpm或tar手动安装:
  • Path A automatically installs an embedded PostgreSQL database to meet the requirements of the services. This path reduces the number of installation tasks to complete and choices to make. In Path A you can optionally choose to create external databases forActivity Monitor, Reports Manager, Hive Metastore, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server.
  • Path B and Path C require you to create databases for the Cloudera Manager Server, Activity Monitor, Reports Manager, Hive Metastore, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server.



使用外部数据库需要更多的输入以及相关工作,但是cloudera提供了更多的兼容性和扩展性,让你可以弹性的选择数据库和配置。
当然可以在一套系统中安装多种不同的数据库,但是这样会带来很多不确定的因素,所以cloudera建议始终使用同一种数据库。

在很多例子中,你需要将相应的service与database安装到同一台机器上,可以减小网络IO,提高整体效率。
当然,你也可以将service和database分开安装到不同的机器上,在大型部署中或者database管理员需要这样的配置,比如这样的场景,Oracle DBA需要独立的管理database。

搭建数据库的配置参考官网,有详细配置步骤:
搭建Cloudera Manager Server数据库
为Activity Monitor, Reports Manager, Hive Metastore, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server搭建外部数据库
为Hue,Oozie搭建外部数据库








概述
上面分析了cloudera manager中监控数据、中心数据的存储方式,怎样配置外部表等。这里进一步分析监控数据的存储,配置,调优等。
Service Monitor 和 Host Monitor 角色在cloudera manager中存储了时间序列、健康数据、Impla查询和Yarn应用的元数据。经过查看相应的Cloudera Manager的存储连接,发现Service Monitor和Host Monitor的存储使用的是levelDB的解析类,由此可以推断出本地存储使用的是levelDB。LevelDB可以说是key-value的数据库的鼻祖,读写效率特别高,并发也很大,而Cloudera Manager的监控的读写数据特别多、频繁。这也可能是Cloudera Manager5选择使用levelDB替换关系数据库的原因。


监控数据在Cloudera Manager升级中的迁移
Cloudera Manager 5 存储Host Monitor 和 Service Monitor数据到本地数据库。如果使用自动化升级配置从Cloudera Manager 4 升级到 5,数据会从Cloudera manager 4 中的内嵌数据库或者外部数据库中自动迁移到Cloudera Manger5中的本地数据库中。这是一个自动化执行的过程中,中间可以查看迁移过程的日志等。



Service Monitor 数据存储的配置
Service Monitor存储了时间序列和健康数据,Impla查询的元数据,Yarn应用的元数据。默认情况下,数据时存储在/var/lib/cloudera-service-monitor/目录下,你也可以修改Service Monitor Storage Directory 配置firehose.storage.base.directory。
You can control how much disk space to reserve for the different classes of data the Service Monitor stores by changing the following configuration options:
Time-series metrics and health data - Time-Series Storage (firehose_time_series_storage_bytes - 10 GB default)
Impala query metadata - Impala Storage (firehose_impala_storage_bytes - 1 GB default)
YARN application metadata - YARN Storage (firehose_yarn_storage_bytes - 1 GB default)




Host Monitor 数据存储的配置
Host Monitor存储了时间序列和健康数据。默认情况下,数据存储在/var/lib/cloudera-host-monitor/目录下,你也可以修改Host Monitor Storage Directory 配置。
You can control how much disk space to reserve for Host Monitor data by changing the following configuration option:
Time-series metrics and health data: Time Series Storage (firehose_time_series_storage_bytes - 10 GB default)




数据粒度和时间序列指标数据
Service Monitor 和 Host Monitor使用很多方式存储时间序列指标数据。数据会不断的被汇总成不同的粒度,比如每一个小时,会把数据的平均值,最小值,最大值汇总成一个小时为粒度的数据,每6个小时,会汇总成6个小时的数据。每天、每个星期等。这种方式只会汇总指标数据。Impla的查询和Yarn应用的监控数据当接近限制时,会删除旧的数据。
当存储快达到限制时,会先删除粒度最细的数据,保证存储空间的释放。比如会先删除小时为粒度的数据,其次是以天微粒度的数据。











转载  http://blog.csdn.net/shifenglov/article/details/41281399

没找到任何评论,期待你打破沉寂

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条