about云开发

 找回密码
 立即注册

QQ登录

只需一步,快速开始

扫一扫,访问微社区

查看: 272|回复: 5

yarn ha中zookeeper的作用

[复制链接]

2

主题

14

帖子

66

积分

注册会员

Rank: 2

积分
66
发表于 2018-1-11 14:39:40 | 显示全部楼层 |阅读模式
请问在yarn的ha中,resourcemanager向zookeeper写入了什么数据,我的rm和zk报错,找不到原因,一些相关的日志如下:
resourcemanager日志:
2018-01-07 16:05:11,244 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Retrying operation on ZK. Retry no. 1
2018-01-07 16:05:11,245 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session disconnected. Entering neutral mode...
2018-01-07 16:05:11,869 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 10.200.185.131/10.200.185.131:2181. Will not attempt to authenticate using SASL (unknown error)
2018-01-07 16:05:11,870 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to 10.200.185.131/10.200.185.131:2181, initiating session
2018-01-07 16:05:11,872 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server 10.200.185.131/10.200.185.131:2181, sessionid = 0x45a8e21311c01ed, negotiated timeout = 10000
2018-01-07 16:05:11,881 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x45a8e21311c01ed, likely server has closed socket, closing socket connection and attempting reconnect
2018-01-07 16:05:11,981 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation.

2018-01-07 16:46:52,350 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Retrying operation on ZK. Retry no. 999
2018-01-07 16:46:53,069 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 10.200.185.134/10.200.185.134:2181. Will not attempt to authenticate using SASL (unknown error)
2018-01-07 16:46:53,070 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to 10.200.185.134/10.200.185.134:2181, initiating session
2018-01-07 16:46:53,071 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server 10.200.185.134/10.200.185.134:2181, sessionid = 0x45a8e21311c01ed, negotiated timeout = 10000
2018-01-07 16:46:53,086 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x45a8e21311c01ed, likely server has closed socket, closing socket connection and attempting reconnect


zookeeper日志:
2018-01-07 16:05:11,145 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x25dd0e6d1f300ec, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:744)
2018-01-07 16:05:12,681 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x45a8e21311c01ed due to java.io.IOException: Len error 1530558

发表于 2018-1-11 14:50:51 | 显示全部楼层
两个问题
1.集群使用的是什么认证方式?SSH?还是SASL
2.集群的时间ntp同步方式是什么?
是全部同步网络服务器,还是master同步网络,其它同步master。集群时间不一致,会造成session通信失败。
欢迎加入about云群425860289432264021 ,云计算爱好者群,关注about云腾讯认证空间

2

主题

14

帖子

66

积分

注册会员

Rank: 2

积分
66
 楼主| 发表于 2018-1-11 14:57:53 | 显示全部楼层
sstutu 发表于 2018-1-11 14:50
两个问题
1.集群使用的是什么认证方式?SSH?还是SASL
2.集群的时间ntp同步方式是什么?

我们使用的是原生的hadoop集群,认证方式应该是ssh(这个具体我也不太确定),集群时间全部同步网络时间
发表于 2018-1-11 15:53:44 | 显示全部楼层
CR_Y 发表于 2018-1-11 14:57
我们使用的是原生的hadoop集群,认证方式应该是ssh(这个具体我也不太确定),集群时间全部同步网络时间

有多少zookeeper,好像只有 [myid:1] 为1的出现了问题。单独看下它的日志。还有进程是否在等。
欢迎加入about云群425860289432264021 ,云计算爱好者群,关注about云腾讯认证空间

2

主题

14

帖子

66

积分

注册会员

Rank: 2

积分
66
 楼主| 发表于 2018-1-11 16:35:22 | 显示全部楼层
sstutu 发表于 2018-1-11 15:53
有多少zookeeper,好像只有 [myid:1] 为1的出现了问题。单独看下它的日志。还有进程是否在等。

zk一共七个节点,其他节点也有类似的日志,这是myid为1的节点日志
2018-01-07 16:05:11,145 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x25dd0e6d1f300ec, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:744)

发表于 2018-1-11 17:08:35 | 显示全部楼层
CR_Y 发表于 2018-1-11 16:35
zk一共七个节点,其他节点也有类似的日志,这是myid为1的节点日志
2018-01-07 16:05:11,145 [myid:1] -  ...

用下面命令看看什么情况:
[Bash shell] 纯文本查看 复制代码
netstat -antp | grep 2181 


欢迎加入about云群425860289432264021 ,云计算爱好者群,关注about云腾讯认证空间
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

站长推荐上一条 /4 下一条

QQ|小黑屋|about云开发-学问论坛|社区-大数据云技术学习分享平台 ( 京ICP备12023829号

GMT+8, 2018-1-20 12:50 , Processed in 0.477931 second(s), 28 queries , Gzip On.

Powered by Discuz! X3.2 Licensed

© 2001-2013 Comsenz Inc.

快速回复 返回顶部 返回列表