本帖最后由 徐超 于 2015-4-6 21:51 编辑
问题导读
1、OpenStack报错,怎么解决?
2、如何对OpenStack的服务进行检查?
OpenStack出现了问题,肿么办?
一般通过如下一些流程走:
1)查看该服务是否运行正常,若没有,则重启之。
2)在相应命令行中使用–debug。
3)查看报错服务的相关日志。
4)检查你的服务配置文件,是否设置正确。
5)QQ群提问、社区反馈等。
本篇主要讲解OpenStack服务的检查,为后面的排错,提供一些思路,仅作抛砖引玉之用。亦是“OpenStack故障排除”的姐妹篇。
好了,咱们节约时间,开始进入正题吧,你懂的。
一、理解日志
1、openstack日志目录
1)Nova日志
目录:
/var/log/nova/
2)Horizon日志
目录:
/var/log/httpd (centos版本)
/var/log/apache2 (ubuntu版本)
3)Swift日志
目录:
/var/log/syslog (ubuntu版本)
/var/log/messages (centos版本)
4)Cinder日志
目录:
/var/log/cinder
5)Keystone日志
目录:
/var/log/keystone/keystong.log
6)Glance日志
目录:
/var/log/glance/*.log
7)Neutron日志
目录:
/var/log/neutron/*.log
8)其他东东日志
2、改变日志级别
每个openstack服务的默认日志级别均为警告级(Warning)。由于各个服务的日志设置方式类似,所以这里便以Nova为例。
1)链接至运行Nova服务的主机上,执行:
- # vim /etc/nova/nova.conf
复制代码
将列出的某个服务的日志级别修改为debug、info或者warning。
2)其他服务,也可以通过调整日志级别到info、debug,比如glance。
- # vim /etc/glance/glance-api.conf
复制代码
二、检查OpenStack服务
1、检查Nova服务
- # nova-manage service list
- Binary Host Zone Status State Updated_At
- nova-consoleauth localhost.localdomain internal enabled :-) 2015-04-02 17:07:42
- nova-scheduler localhost.localdomain internal enabled :-) 2015-04-02 17:07:48
- nova-conductor localhost.localdomain internal enabled :-) 2015-04-02 17:07:42
- nova-compute localhost.localdomain nova enabled :-) 2015-04-02 17:07:50
- nova-cert localhost.localdomain internal enabled :-) 2015-04-02 17:07:41
复制代码
这里,从中判断哪些服务是ok的还是错的,就不用我说了吧,你懂的。
2、检查glance服务
- # ps -ef | grep glance
- glance 24624 1 0 Apr01 ? 00:00:00 /usr/bin/python /usr/bin/glance-registry
- glance 24641 24624 0 Apr01 ? 00:00:02 /usr/bin/python /usr/bin/glance-registry
- glance 24642 24624 0 Apr01 ? 00:00:02 /usr/bin/python /usr/bin/glance-registry
- glance 24667 1 1 Apr01 ? 00:26:51 /usr/bin/python /usr/bin/glance-api
- glance 24674 24667 0 Apr01 ? 00:00:01 /usr/bin/python /usr/bin/glance-api
- glance 24675 24667 0 Apr01 ? 00:00:01 /usr/bin/python /usr/bin/glance-api
- root 111408 55355 0 13:59 pts/1 00:00:00 grep --color=auto glance
复制代码
查看默认的监听端口9292是否处于LISTEN状态。
- # netstat -ant | grep 9292.*LISTEN
- tcp 0 0 0.0.0.0:9292 0.0.0.0:* LISTEN
复制代码
3、检查rabbitmq服务
- # rabbitmqctl status
- Status of node rabbit@localhost ...
- [{pid,114125},
- {running_applications,[{rabbit,"RabbitMQ","3.1.5"},
- {os_mon,"CPO CXC 138 46","2.2.14"},
- {mnesia,"MNESIA CXC 138 12","4.11"},
- {xmerl,"XML parser","1.3.6"},
- {sasl,"SASL CXC 138 11","2.3.4"},
- {stdlib,"ERTS CXC 138 10","1.19.4"},
- {kernel,"ERTS CXC 138 10","2.16.4"}]},
- {os,{unix,linux}},
- {erlang_version,"Erlang R16B03-1 (erts-5.10.4) [source] [64-bit] [smp:2:2] [async-threads:30] [hipe] [kernel-poll:true]\n"},
- {memory,[{total,57896920},
- {connection_procs,1958608},
- {queue_procs,527648},
- {plugins,0},
- {other_proc,13824928},
- {mnesia,220880},
- {mgmt_db,0},
- {msg_index,40288},
- {other_ets,851384},
- {binary,19283296},
- {code,16447484},
- {atom,594537},
- {other_system,4147867}]},
- {vm_memory_high_watermark,0.4},
- {vm_memory_limit,766273126},
- {disk_free_limit,1000000000},
- {disk_free,46849736704},
- {file_descriptors,[{total_limit,924},
- {total_used,32},
- {sockets_limit,829},
- {sockets_used,30}]},
- {processes,[{limit,1048576},{used,445}]},
- {run_queue,0},
- {uptime,20}]
- ...done.
复制代码
如果,出现错误,一般可以通过重启服务来解决,比如:
- # systemctl restart rabbitmq-server 适用于centos 7版本
复制代码
4、检查ntp服务
复制代码
5、检查MySQL数据库服务
- # mysqladmin -uroot status
- Uptime: 94613 Threads: 30 Questions: 680144 Slow queries: 0 Opens: 1433 Flush tables: 2 Open tables: 314 Queries per second avg: 7.188
复制代码
6、检查Horizon服务
- # ps -ef | grep httpd (centos版本,若是Ubuntu则改为apache),你懂的
复制代码
查看httpd服务是否运行在默认的tcp端口80上:
- # netstat -ano | grep :80
- tcp 0 0 192.168.1.10:8080 0.0.0.0:* LISTEN off (0.00/0/0)
- tcp6 0 0 :::80 :::* LISTEN off (0.00/0/0)
复制代码
验证是否可以连接到web网页服务器上,执行:
正常的话,会显示如下的输出,你懂的:
- # telnet localhost 80
- Trying ::1...
- Connected to localhost.
- Escape character is '^]'.
复制代码
7、检查keystonef服务
- # ps -ef | grep keystone
- keystone 23326 1 1 Apr01 ? 00:28:11 /usr/bin/python /usr/bin/keystone-all
- keystone 23333 23326 0 Apr01 ? 00:00:16 /usr/bin/python /usr/bin/keystone-all
- keystone 23334 23326 0 Apr01 ? 00:00:08 /usr/bin/python /usr/bin/keystone-all
- keystone 23335 23326 0 Apr01 ? 00:00:03 /usr/bin/python /usr/bin/keystone-all
- keystone 23336 23326 0 Apr01 ? 00:00:02 /usr/bin/python /usr/bin/keystone-all
- root 118062 55355 0 14:47 pts/1 00:00:00 grep --color=auto keystone
复制代码
查看keystone服务是否在默认的5000端口上监听:
业务端口:5000
管理端口:35357
- # netstat -anlp | grep 5000
- tcp 0 0 0.0.0.0:5000 0.0.0.0:* LISTEN 23326/python
复制代码
8、检查Neutron服务
- # netstat -anlp | grep 9696
- tcp 0 0 0.0.0.0:9696 0.0.0.0:* LISTEN 36666/python
- tcp 0 0 192.168.1.10:60031 192.168.1.10:9696 TIME_WAIT -
- tcp 0 0 192.168.1.10:60029 192.168.1.10:9696 TIME_WAIT -
- tcp 0 0 192.168.1.10:60032 192.168.1.10:9696 TIME_WAIT -
- tcp 0 0 192.168.1.10:60033 192.168.1.10:9696 TIME_WAIT -
- tcp 0 0 192.168.1.10:60034 192.168.1.10:9696 TIME_WAIT -
复制代码
在计算节点上,使用ps命令查看如下服务是否运行正常:
- ovsdb-server
- ovs-switchd
- neutron-openvswitch-agent
例如,执行下面命令:
- # ps -ef | grep ovsdb-server
- root 36163 1 0 Apr01 ? 00:00:00 ovsdb-server: monitoring pid 36164 (healthy)
- root 36164 36163 0 Apr01 ? 00:00:06 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor
- root 120067 55355 0 15:02 pts/1 00:00:00 grep --color=auto ovsdb-server
复制代码
在网络节点上,查看如下服务是否运行正常:
- ovsdb-server
- ovs-switchd
- neutron-openvswitch-agent
- neutron-dhcp-agent
- neutron-l3-agent
- neutron-metadata-agent
这里,我就不举例了,你懂的。
同时,为了验证neutron agent代理服务是否运行正常,在控制节点上执行:
正常情况的话,会是下面这样,你懂的
- # neutron agent-list
- +--------------------------------------+--------------------+-----------------------+-------+----------------+---------------------------+
- | id | agent_type | host | alive | admin_state_up | binary |
- +--------------------------------------+--------------------+-----------------------+-------+----------------+---------------------------+
- | 42531d2d-6dad-4b31-8520-522c126fe241 | Metadata agent | localhost.localdomain | :-) | True | neutron-metadata-agent |
- | 78a41e46-31fb-4cee-aeca-b6f88afc31a1 | L3 agent | localhost.localdomain | :-) | True | neutron-l3-agent |
- | bf69d98f-49eb-4bdc-9b7f-e664254cbb0f | DHCP agent | localhost.localdomain | :-) | True | neutron-dhcp-agent |
- | e06d82a2-50a3-4ba5-8863-6c1cd986077a | Open vSwitch agent | localhost.localdomain | :-) | True | neutron-openvswitch-agent |
- +--------------------------------------+--------------------+--------
复制代码
9、检查cinder服务
这输出格式,确实难看,就将就点吧,如果你有啥好点子,别忘了告诉我啊。
复制代码
检查iscsi target是否在网络上监听正常
- # netstat -anlp | grep 3260
复制代码
检查cinder api服务是否监听正常
- # netstat -anlp | grep 8776
- tcp 0 0 0.0.0.0:8776 0.0.0.0:* LISTEN 25457/python2
复制代码
10、检查swift服务是否正常
- 1)# swift stat
- Account: AUTH_1f7e9eb5143c48a59c1b164d54f45e58
- Containers: 0
- Objects: 0
- Bytes: 0
- X-Put-Timestamp: 1428002235.46022
- X-Timestamp: 1428002235.46022
- X-Trans-Id: tx4a3e4aa07ee14e14ba8e1-00551d95ba
- Content-Type: text/plain; charset=utf-8
复制代码
复制代码
3)检查swift api服务是否正常
- # ps -ef | grep swift-proxy
- swift 40407 1 0 Apr01 ? 00:00:00 /usr/bin/python /usr/bin/swift-proxy-server /etc/swift/proxy-server.conf
- swift 40475 40407 0 Apr01 ? 00:00:00 /usr/bin/python /usr/bin/swift-proxy-server /etc/swift/proxy-server.conf
- swift 40476 40407 0 Apr01 ? 00:00:00 /usr/bin/python /usr/bin/swift-proxy-server /etc/swift/proxy-server.conf
复制代码
暂时就这些吧,其他的我也不知道了,我还只是菜鸟啊,连OpenStack都没入门,openstack版本每半年出一个版本,也无法知道下一次又会有什么新鲜玩意。
如果,你看完本篇后,想要吐槽,可以在这里,也可以在我的这里吐槽。
|