从源码角度分析OpenStack 中重启实例的两种方式区别【软重启和硬重启】

本帖最后由 fc013 于 2016-12-25 17:29 编辑

问题导读：

1.“软重启”和“硬重启”在参数上有哪些区别?

2.“软重启”和“硬重启”中虚拟机状态有哪些不同?

在OpenStack 中重启实例有两种，分别被称为“软重启”和“硬重启”。所谓的软重启会尝试正常关机并重启实例，硬重启会直接将实例“断电”并重启。也就是说硬重启会“关闭”电源。其具体命令如下：

默认情况下，如果您通过nova重启，执行的是软重启。

[mw_shl_code=shell,true]nova reboot SERVER[/mw_shl_code]

如果您需要执行硬重启，添加--hard参数即可：

[mw_shl_code=shell,true]nova reboot --hard SERVER[/mw_shl_code]

从命令上看，两者只是参数上有所区别，因此跟踪具体代码研究一下（对应nova 代码为L版本的nova-12.0.0）。

首先找到入口的API 接口函数：Nova->api->openstack->compute->servers.py

[mw_shl_code=python,true] @wsgi.response(202)
@extensions.expected_errors((404, 409))
@wsgi.action('reboot')
@validation.schema(schema_servers.reboot)
def _action_reboot(self, req, id, body):

      reboot_type = body['reboot']['type'].upper()
      context = req.environ['nova.context']
      authorize(context, action='reboot')
      instance = self._get_server(context, req, id)

      try:
         self.compute_api.reboot(context, instance, reboot_type)
      except exception.InstanceIsLocked as e:
         raise exc.HTTPConflict(explanation=e.format_message())
      except exception.InstanceInvalidState as state_error:
         common.raise_http_conflict_for_instance_invalid_state(state_error,
                  'reboot', id)[/mw_shl_code]

上述代码中第13行（self.compute_api.reboot(context, instance, reboot_type)），

跳转到具体实现代码: nova->compute->api.py

[mw_shl_code=python,true] @wrap_check_policy
@check_instance_lock
@check_instance_state(vm_state=set(
                  vm_states.ALLOW_SOFT_REBOOT + vm_states.ALLOW_HARD_REBOOT),
                        task_state=[None, task_states.REBOOTING,
                                    task_states.REBOOT_PENDING,
                                    task_states.REBOOT_STARTED,
                                    task_states.REBOOTING_HARD,
                                    task_states.RESUMING,
                                    task_states.UNPAUSING,
                                    task_states.PAUSING,
                                    task_states.SUSPENDING])
def reboot(self, context, instance, reboot_type):
      """Reboot the given instance."""
      if (reboot_type == 'SOFT' and
         (instance.vm_state not in vm_states.ALLOW_SOFT_REBOOT)):
         raise exception.InstanceInvalidState(
            attr='vm_state',
            instance_uuid=instance.uuid,
            state=instance.vm_state,
            method='soft reboot')
      if reboot_type == 'SOFT' and instance.task_state is not None:
         raise exception.InstanceInvalidState(
            attr='task_state',
            instance_uuid=instance.uuid,
            state=instance.task_state,
            method='reboot')
      expected_task_state = [None]
      if reboot_type == 'HARD':
         expected_task_state.extend([task_states.REBOOTING,
                                    task_states.REBOOT_PENDING,
                                    task_states.REBOOT_STARTED,
                                    task_states.REBOOTING_HARD,
                                    task_states.RESUMING,
                                    task_states.UNPAUSING,
                                    task_states.SUSPENDING])
      state = {'SOFT': task_states.REBOOTING,
               'HARD': task_states.REBOOTING_HARD}[reboot_type]
      instance.task_state = state
      instance.save(expected_task_state=expected_task_state)

      self._record_action_start(context, instance, instance_actions.REBOOT)

      self.compute_rpcapi.reboot_instance(context, instance=instance,
                                          block_device_info=None,
                                          reboot_type=reboot_type)[/mw_shl_code]

在compute-api的代码中，首先注意到在reboot方法上有几个装饰函数，其中的check_instance_state方法会检查当前的虚拟机是否处于如task_states.RESUMING这样的状态，如果处于的话，则提示InstanceInvalidState。也就是说

当虚拟机的任务处于（REBOOTING,REBOOT_PENDING,REBOOT_STARTED,REBOOTING_HARD,RESUMING,UNPAUSING,PAUSING,SUSPENDING）时，软硬重启都不允许。详细内容可以参看装饰函数的具体实现代码：

[mw_shl_code=python,true]代码位于： nova->compute->api.py
def check_instance_state(vm_state=None, task_state=(None,),
                     must_have_launched=True):
"""Decorator to check VM and/or task state before entry to API functions.

If the instance is in the wrong state, or has not been successfully
started at least once the wrapper will raise an exception.
"""

if vm_state is not None and not isinstance(vm_state, set):
      vm_state = set(vm_state)
if task_state is not None and not isinstance(task_state, set):
      task_state = set(task_state)

def outer(f):
      @functools.wraps(f)
      def inner(self, context, instance, *args, **kw):
         if vm_state is not None and instance.vm_state not in vm_state:
            raise exception.InstanceInvalidState(
                  attr='vm_state',
                  instance_uuid=instance.uuid,
                  state=instance.vm_state,
                  method=f.__name__)
         if (task_state is not None and
                  instance.task_state not in task_state):  （lst: 判断是否能软，硬重启）
            raise exception.InstanceInvalidState(
                  attr='task_state',
                  instance_uuid=instance.uuid,
                  state=instance.task_state,
                  method=f.__name__)
         if must_have_launched and not instance.launched_at:
            raise exception.InstanceInvalidState(
                  attr='launched_at',
                  instance_uuid=instance.uuid,
                  state=instance.launched_at,
                  method=f.__name__)

         return f(self, context, instance, *args, **kw)
      return inner
return outer[/mw_shl_code]

然后，在回来继续看compute-api中的reboot的代码，此时软重启和硬重启在条件的判断上就略有区别了。

从代码中可以看出

如果是软重启，则需要继续判断虚拟机当前是否又其他任务，如果有则抛异常。
如果操作是硬重启，则还需要更新expected_task_state的可能扩展状态（task_states.REBOOTING， task_states.REBOOT_PENDING, task_states.REBOOT_STARTED, task_states.REBOOTING_HARD,task_states.RESUMING, task_states.UNPAUSING,task_states.SUSPENDING）做标识，并传递下去。

但无论是软重启还是硬重启，都重新给虚拟机state 重新赋值，并通过RPC调用reboot_instance(第44行代码)。

代码跳转至 Nova->compute->manager.py（注意由于这里的代码实质上是通过RPC远程调用的，所以其实际发生作用的代码应该是对应虚拟机所在Compute节点上了）

[mw_shl_code=python,true] 代码位于Nova->compute->manager.py：2874行
@wrap_exception()
@reverts_task_state
@wrap_instance_event
@wrap_instance_fault
def reboot_instance(self, context, instance, block_device_info,
                     reboot_type):
      """Reboot an instance on this host."""
      # acknowledge the request made it to the manager
      if reboot_type == "SOFT":
         instance.task_state = task_states.REBOOT_PENDING
         expected_states = (task_states.REBOOTING,
                           task_states.REBOOT_PENDING,
                           task_states.REBOOT_STARTED)
      else:
         instance.task_state = task_states.REBOOT_PENDING_HARD
         expected_states = (task_states.REBOOTING_HARD,
                           task_states.REBOOT_PENDING_HARD,
                           task_states.REBOOT_STARTED_HARD)
      context = context.elevated()
      LOG.info(_LI("Rebooting instance"), context=context, instance=instance)

      block_device_info = self._get_instance_block_device_info(context,
                                                               instance)

      network_info = self.network_api.get_instance_nw_info(context, instance)

      self._notify_about_instance_usage(context, instance, "reboot.start")

      instance.power_state = self._get_power_state(context, instance)
      instance.save(expected_task_state=expected_states)

      if instance.power_state != power_state.RUNNING:
         state = instance.power_state
         running = power_state.RUNNING
         LOG.warning(_LW('trying to reboot a non-running instance:'
                        ' (state: %(state)s expected: %(running)s)'),
                     {'state': state, 'running': running},
                     context=context, instance=instance)

      def bad_volumes_callback(bad_devices):
         self._handle_bad_volumes_detached(
                  context, instance, bad_devices, block_device_info)

      try:
         # Don't change it out of rescue mode
         if instance.vm_state == vm_states.RESCUED:
            new_vm_state = vm_states.RESCUED
         else:
            new_vm_state = vm_states.ACTIVE
         new_power_state = None
         if reboot_type == "SOFT":
            instance.task_state = task_states.REBOOT_STARTED
            expected_state = task_states.REBOOT_PENDING
         else:
            instance.task_state = task_states.REBOOT_STARTED_HARD
            expected_state = task_states.REBOOT_PENDING_HARD
         instance.save(expected_task_state=expected_state)
         self.driver.reboot(context, instance,
                           network_info,
                           reboot_type,
                           block_device_info=block_device_info,
                           bad_volumes_callback=bad_volumes_callback)

      except Exception as error:
         with excutils.save_and_reraise_exception() as ctxt:
            exc_info = sys.exc_info()
            # if the reboot failed but the VM is running don't
            # put it into an error state
            new_power_state = self._get_power_state(context, instance)
            if new_power_state == power_state.RUNNING:
                  LOG.warning(_LW('Reboot failed but instance is running'),
                              context=context, instance=instance)
                  compute_utils.add_instance_fault_from_exc(context,
                        instance, error, exc_info)
                  self._notify_about_instance_usage(context, instance,
                        'reboot.error', fault=error)
                  ctxt.reraise = False
            else:
                  LOG.error(_LE('Cannot reboot instance: %s'), error,
                           context=context, instance=instance)
                  self._set_instance_obj_error_state(context, instance)

      if not new_power_state:
         new_power_state = self._get_power_state(context, instance)
      try:
         instance.power_state = new_power_state
         instance.vm_state = new_vm_state
         instance.task_state = None
         instance.save()
      except exception.InstanceNotFound:
         LOG.warning(_LW("Instance disappeared during reboot"),
                     context=context, instance=instance)

      self._notify_about_instance_usage(context, instance, "reboot.end")[/mw_shl_code]

从上述代码可知，根据软硬重启的类型不同虚拟机将置成不同的状态。完毕后依此获取块设备和网络设备信息以及虚拟机电源状态，判断电源状态是否位于RUNNING状态，如果不为RUNNING状态，则将状态职位职位RUNNING,下面继续判断状态，最终将相关信息传递给driver.reboot。

继续跟中到driver层：Nova->virt->hyper->driver.py

[mw_shl_code=shell,true]def reboot(self, context, instance, network_info, reboot_type,
            block_device_info=None, bad_volumes_callback=None):
      self._vmops.reboot(instance, network_info, reboot_type)[/mw_shl_code]
往下到Nova->virt->hyper->vmops.py
[mw_shl_code=shell,true] def reboot(self, instance, network_info, reboot_type):
      """Reboot the specified instance."""
      LOG.debug("Rebooting instance", instance=instance)

      if reboot_type == REBOOT_TYPE_SOFT:
         if self._soft_shutdown(instance):
            self.power_on(instance)
            return

      self._set_vm_state(instance,
                        constants.HYPERV_VM_STATE_REBOOT)[/mw_shl_code]

从上述代码我们可以看到如果是软重启的话，则将执行一个_soft_shutdown(instance) 函数：

[mw_shl_code=python,true] def _soft_shutdown(self, instance,
                     timeout=CONF.hyperv.wait_soft_reboot_seconds,
                     retry_interval=SHUTDOWN_TIME_INCREMENT):
      """Perform a soft shutdown on the VM.（lst:软重启特有步骤,此步骤就是区别软重启和硬重启的不同之处）

         :return: True if the instance was shutdown within time limit,
                  False otherwise.
      """
      LOG.debug("Performing Soft shutdown on instance", instance=instance)

      while timeout > 0:
         # Perform a soft shutdown on the instance.
         # Wait maximum timeout for the instance to be shutdown.
         # If it was not shutdown, retry until it succeeds or a maximum of
         # time waited is equal to timeout.
         wait_time = min(retry_interval, timeout)
         try:
            LOG.debug("Soft shutdown instance, timeout remaining: %d",
                        timeout, instance=instance)
            self._vmutils.soft_shutdown_vm(instance.name)
            if self._wait_for_power_off(instance.name, wait_time):
                  LOG.info(_LI("Soft shutdown succeeded."),
                           instance=instance)
                  return True
         except vmutils.HyperVException as e:
            # Exception is raised when trying to shutdown the instance
            # while it is still booting.
            LOG.debug("Soft shutdown failed: %s", e, instance=instance)
            time.sleep(wait_time)

         timeout -= retry_interval

      LOG.warning(_LW("Timed out while waiting for soft shutdown."),
                  instance=instance)
      return False[/mw_shl_code]

_soft_shutdown(instance) 函数主要是在规定时间范围内去关闭指定的虚拟机，关闭虚拟机的核心操作为第20行（self._vmutils.soft_shutdown_vm(instance.name)）.

[mw_shl_code=python,true] def soft_shutdown_vm(self, vm_name):
      vm = self._lookup_vm_check(vm_name)
      shutdown_component = vm.associators(
         wmi_result_class=self._SHUTDOWN_COMPONENT)

      if not shutdown_component:
         # If no shutdown_component is found, it means the VM is already
         # in a shutdown state.
         return

      reason = 'Soft shutdown requested by OpenStack Nova.'
      (ret_val, ) = shutdown_component[0].InitiateShutdown(Force=False,
                                                         Reason=reason)
      self.check_ret_val(ret_val, None)[/mw_shl_code]

当虚拟机关闭以后，软重启的话就直接power_on虚拟机了，所谓的power_on虚拟机，实际上主要就是设置虚拟机状态

[mw_shl_code=python,true] def power_on(self, instance, block_device_info=None):
      """Power on the specified instance."""
      LOG.debug("Power on instance", instance=instance)

      if block_device_info:
         self._volumeops.fix_instance_volume_disk_paths(instance.name,
                                                         block_device_info)

      self._set_vm_state(instance, constants.HYPERV_VM_STATE_ENABLED)[/mw_shl_code]

而如果是硬重启的话，则执行 _set_vm_state方法：

[mw_shl_code=python,true] def _set_vm_state(self, instance, req_state):
      instance_name = instance.name
      instance_uuid = instance.uuid

      try:
         self._vmutils.set_vm_state(instance_name, req_state)

         if req_state in (constants.HYPERV_VM_STATE_DISABLED,
                           constants.HYPERV_VM_STATE_REBOOT):
            self._delete_vm_console_log(instance)
         if req_state in (constants.HYPERV_VM_STATE_ENABLED,
                           constants.HYPERV_VM_STATE_REBOOT):
            self.log_vm_serial_output(instance_name,
                                       instance_uuid)

         LOG.debug("Successfully changed state of VM %(instance_name)s"
                  " to: %(req_state)s", {'instance_name': instance_name,
                                          'req_state': req_state})
      except Exception:
         with excutils.save_and_reraise_exception():
            LOG.error(_LE("Failed to change vm state of %(instance_name)s"
                           " to %(req_state)s"),
                        {'instance_name': instance_name,
                        'req_state': req_state})[/mw_shl_code]

从中可以看出，硬重启则只是改变了VM的状态，少了挂载块设备一项。

来源：csdn

作者：jackny9

图文精华

从源码角度分析OpenStack 中重启实例的两种方式区别【软重启和硬重启】

推荐 /2