分享

security hadoop持续认证失败

ckaiwj1314 发表于 2015-1-5 12:36:28 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 7 78753
你用rpm安装cdh5.2的hadoop集群 并用kerberos做了安全认证,namenode进程起来的时候 提示kerberos认证成功。但是一天之后(24小时)就提示认证失败 很规律 每次重新起进程 就只有24小时可以用 过后就报错 提示认证失败求各位大神帮帮忙


日志情况如下

2015-01-01 16:37:53,116 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for hdfs/hadoop2@EXAMPLE.COM (auth:KERBEROS)
2015-01-01 16:37:53,119 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for hdfs/hadoop2@EXAMPLE.COM (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.server.protocol.NamenodeProtocol
2015-01-01 16:37:53,119 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 10.80.14.22
2015-01-01 16:37:53,119 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
2015-01-01 16:37:53,119 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 12668
2015-01-01 16:37:53,119 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms): 22 74
2015-01-01 16:37:53,127 WARN org.apache.hadoop.security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
2015-01-01 16:37:53,127 WARN org.apache.hadoop.security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
2015-01-01 16:37:54,764 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 1644ms to send a batch of 1 edits (17 bytes) to remote journal 10.80.14.21:8485
2015-01-01 16:37:56,004 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 2883ms to send a batch of 1 edits (17 bytes) to remote journal 10.80.14.22:8485
2015-01-01 16:37:56,010 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 2908 79
2015-01-01 16:37:56,037 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /hadoopdata/hadoop/name/current/edits_inprogress_0000000000000012668 -> /hadoopdata/hadoop/name/current/edits_0000000000000012668-0000000000000012669
2015-01-01 16:37:56,037 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 12670
2015-01-01 16:37:56,419 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 3299ms to send a batch of 1 edits (17 bytes) to remote journal 10.80.14.26:8485
2015-01-01 16:38:09,685 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2015-01-01 16:38:09,686 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
2015-01-01 16:38:19,785 INFO org.apache.hadoop.hdfs.server.namenode.ImageServlet: ImageServlet allowing checkpointer: hdfs/hadoop2@EXAMPLE.COM
2015-01-01 16:38:19,825 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took 0.04s at 100.00 KB/s
2015-01-01 16:38:19,825 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000012669 size 4704 bytes.
2015-01-01 16:38:19,873 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Going to retain 2 images with txid >= 12609
2015-01-01 16:38:19,873 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Purging old image FSImageFile(file=/hadoopdata/hadoop/name/current/fsimage_0000000000000012549, cpktTxId=0000000000000012549)
2015-01-01 16:38:39,685 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2015-01-01 16:38:39,685 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2015-01-01 16:39:09,685 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2015-01-01 16:39:09,686 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).   这边很突然 没有error 认证就失败了 进程还在但是已经不能用了
2015-01-01 16:39:23,958 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 10.80.14.21:45417:null (GSS initiate failed)
2015-01-01 16:39:23,959 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client 10.80.14.21 threw exception [javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Invalid argument (400) - Cannot find key of appropriate type to decrypt AP REP - AES256 CTS mode with HMAC SHA1-96)]]
2015-01-01 16:39:24,567 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 10.80.14.21:36313:null (GSS initiate failed)
2015-01-01 16:39:24,567 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client 10.80.14.21 threw exception [javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Invalid argument (400) - Cannot find key of appropriate type to decrypt AP REP - AES256 CTS mode with HMAC SHA1-96)]]
2015-01-01 16:39:24,704 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 10.80.14.21:34757:null (GSS initiate failed)
2015-01-01 16:39:24,704 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client 10.80.14.21 threw exception [javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Invalid argument (400) - Cannot find key of appropriate type to decrypt AP REP - AES256 CTS mode with HMAC SHA1-96)]]


已有(7)人评论

跳转到指定楼层
tntzbzc 发表于 2015-1-5 14:39:44

回帖奖励 +1 云币

看看本地是否没有了ticket cache
详细参考:
YARN & HDFS2 安装和配置Kerberos


回复

使用道具 举报

ckaiwj1314 发表于 2015-1-5 17:45:46
tntzbzc 发表于 2015-1-5 14:39
看看本地是否没有了ticket cache
详细参考:
YARN & HDFS2 安装和配置Kerberos

请问下 这个cache要怎么查看  
回复

使用道具 举报

bioger_hit 发表于 2015-1-5 18:11:31
ckaiwj1314 发表于 2015-1-5 17:45
请问下 这个cache要怎么查看
可以使用这个命令klist  例如:



  1. [hadoop@dev80 hadoop]$ klist  
复制代码




例子解释:

[hadoop@dev80 hadoop]$ klist  
Ticket cache: FILE:/tmp/krb5cc_500  
Default principal: hadoop@DIANPING.COM  
Valid starting     Expires            Service principal  
09/11/13 15:25:34  09/12/13 15:25:34  krbtgt/DIANPING.COM@DIANPING.COM  
renew until 09/12/13 15:25:34  



其中/tmp/krb5cc_500就是kerberos ticket cache, 默认会在/tmp下创建名字为“krb5cc_”加上uid的文件,此处500表示hadoop帐号的uid




执行命令kinit,获得一张tgt(ticket granting ticket)

  1. [hadoop@dev80 hadoop]$ kinit -r 24l -k -t /home/hadoop/.keytab hadoop  
复制代码








[hadoop@dev80 hadoop]$ getent passwd  
hadoop:x:500:500::/home/hadoop:/bin/bash  

用户也可以通过设置export KRB5CCNAME=/tmp/krb5cc_500到环境变量来指定ticket cache路径

回复

使用道具 举报

ckaiwj1314 发表于 2015-1-5 18:17:53
因为我是rpm包安装的 所以进程需要root来启动
servicehadoop-hdfs-namenode start
[root@hadoop1 ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: hdfs/hadoop1@EXAMPLE.COM

Valid starting     Expires            Service principal
01/05/15 18:00:02  01/06/15 18:00:02  krbtgt/EXAMPLE.COM@EXAMPLE.COM
        renew until 01/12/15 18:00:02

这是我的信息 而且我用crontab做了持续认证
[root@hadoop1 ~]# crontab -l
00 * * * * /usr/bin/kinit -k -t /etc/hadoop/conf/hdfs.keytab hdfs/hadoop1




回复

使用道具 举报

muyannian 发表于 2015-1-5 20:39:46
ckaiwj1314 发表于 2015-1-5 18:17
因为我是rpm包安装的 所以进程需要root来启动
servicehadoop-hdfs-namenode start
[root@hadoop1 ~]#  ...
注意权限问题,这是集群中比较忌讳的。
回复

使用道具 举报

ckaiwj1314 发表于 2015-1-7 12:00:10
我集群能正常起来 启动时 日志显示认证成功的。
回复

使用道具 举报

墨魂 发表于 2015-2-6 11:55:32
本帖最后由 墨魂 于 2015-2-6 12:01 编辑
ckaiwj1314 发表于 2015-1-7 12:00
我集群能正常起来 启动时 日志显示认证成功的。

1.看下启动日志,看看启动时的环境变量是否有KRBCCNAME位置,2.PRINCIPAL renewliftime设置是否正确,再检察下krb5.conf配置里的续订时间是否正确,3.检查下配置,确保是从默认的HDFS-SITE.XML上的配置进行认证启动的,而不是通过JAVA_OPT设置JAAS配置登陆。

PS:就我自己遇到的问题,是在HBASE上存在的,KDC用的是windows server 票据有效期统一为10小时(10小时内必须续订),由于自己在配置时做测试时使用了-Djava.security.auth.login.config = jaas.conf,即使用了JAVA的Krb5LoginModule获取的认证(配置里设置的useTicketcache 为 false) 导致了RIGIONSERVER不会读取票据缓存也不会续订票据,结果每次都是10小时准时挂掉。后面修正配置后就正常了
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条