TensorFlow教程：6GPU和TensorFlow Serving介绍

问题导读：

1.GPU编程模型的工作原理是什么？

2.如何安装和使用TensorFlow服务？

3.如何加载和导出TensorFlow模型？

上一篇：TensorFlow教程：5深度学习

在本章中，我们将介绍以下主题：

GPU编程
TensorFlow服务：
- 如何安装TensorFlow服务
- 如何使用TensorFlow服务
- 如何加载和导出TensorFlow模型

GPU programming

In Chapter 5 , Deep Learning, where we trained a recurrent neural network (RNN) for an NLP application, we could see that deep learning applications can be computationally intensive. 但是，通过graphic processing unit（GPU）使用并行编程技术，可以减少训练时间。实际上，现代图形单元的计算资源使其能够执行并行代码部分，从而确保高性能。

GPU编程模型是一种编程策略，包括将CPU替换为GPU以加速执行各种应用程序。这一战略的应用范围非常广泛，日益增长，目前，GPU可以减少跨平台的应用执行时间，从汽车到手机，从平板电脑到无人机和机器人。

下图显示了GPU编程模型的工作原理。在应用程序中，有一些调用告诉CPU放弃代码GPU的特定部分，让它运行得到高执行速度。这个特定部分依靠两个GPU的原因是GPU架构所提供的速度。 GPU has many Streaming Multiprocessors (SMPs), with each having many computational cores.These cores are capable of performing ALU and other operations with the help of Single Instruction Multiple Thread (SIMT) calls, which reduce the execution time drastically.

在GPU编程模型中，有些代码是在CPU中顺序执行的，有些部分是由GPU并行执行的

TensorFlow拥有可以利用此编程模型（如果您有NVIDIA GPU）的功能，支持GPU的软件包版本需要Cuda Toolkit 7.0和6.5 CUDNN V2。

T0>注意

对于Cuda环境的安装，我们建议引用Cuda安装页面：http://docs.nvidia.com/cuda/cuda-count-getting-started-guide-for-linux/#axzz49w1XvzNj

TensorFlow通过以下方式引用这些设备：

/ cpu：0：引用服务器CPU
/ gpu：0：GPU服务器，如果只有一个
/ gpu：1：第二个GPU服务器等等

要找出哪个设备分配给我们的操作，张紧器需要创建会话，并将log_device_placement设置为True。

考虑下面的例子。

We create a computational graph; a and b will be two matrices:

[mw_shl_code=python,true]a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')[/mw_shl_code]

在c中，我们把这两个输入张量的矩阵相乘：

[mw_shl_code=python,true]c = tf.matmul(a, b)[/mw_shl_code]

然后我们建立一个会话，将log_device_placement设置为True：

[mw_shl_code=python,true]sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))[/mw_shl_code]

最后，我们启动会议：

[mw_shl_code=python,true]print sess.run(c)[/mw_shl_code]

您应该看到以下输出：

[mw_shl_code=text,true]Device mapping:

/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/gpu:0
a: /job:localhost/replica:0/task:0/gpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
[[ 22. 28.]
[ 49. 64.]][/mw_shl_code]

如果您希望特定的操作在您选择的设备上运行，而不是自动为您选择，可以使用tf.device创建设备上下文，以便在该上下文中执行所有操作将具有相同的设备分配。

让我们使用tf.device指令创建相同的计算图：

[mw_shl_code=python,true]with tf.device('/cpu:0'):a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)[/mw_shl_code]

再次，我们构建会话图并启动它：

[mw_shl_code=python,true]sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print sess.run(c)[/mw_shl_code]

你会看到a和b分配给cpu：0：

[mw_shl_code=text,true]Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
[[ 22. 28.]
[ 49. 64.]][/mw_shl_code]

如果您的GPU不止一个，那么在创建会话时，可以直接在配置选项中将allow_soft_placement设置为True。

TensorFlow Serving

Serving是一个TensorFlow软件包，已经被开发用于将机器学习模型引入生产系统。这意味着开发人员可以使用TensorFlow服务的API来构建一个服务器来为实现的模型提供服务。

服务的模型将能够每次对客户提供的数据进行推理和预测，从而改进模型。

为了与服务系统进行通信，客户使用由Google开发的称为gRPC的高性能开源remote procedure call（RPC）接口。

典型的流水线（见下图）是将训练数据输入给学习者，输出一个模型。经过验证后，就可以部署到TensorFlow服务系统。随着新的数据变得可用，或者随着模型的改进，随着时间推移和迭代我们的模型是相当普遍的。

TensorFlow服务管道

How to install TensorFlow Serving

要编译和使用TensorFlow服务，您需要设置一些先决条件。

T0>巴泽勒

TensorFlow服务要求Bazel 0.2.0（http://www.bazel.io/）或更高。下载bazel-0.2.0-installer-linux-x86_64.sh。

T0>注意

Bazel是一个自动化软件构建和测试的工具。支持的构建任务包括运行编译器和连接器来生成可执行程序和库，以及组装可部署的软件包。

运行以下命令：

[mw_shl_code=shell,true]chmod +x bazel-0.2.0-installer-linux-x86_64.sh
./bazel-0.2.0-installer-linux-x86_64.sh -user[/mw_shl_code]

最后，建立你的环境。将其导出到〜/ .bashrc目录中：

[mw_shl_code=shell,true]export PATH="$PATH:$HOME/bin"[/mw_shl_code]

T0> GRPC

我们的教程使用gRPC（0.13或更高）作为我们的RPC框架。

T0>注意

您可以在https://github.com/grpc找到其他参考。

TensorFlow服务依赖关系

要安装TensorFlow服务依赖关系，请执行以下操作：

[mw_shl_code=shell,true]sudo apt-get update && sudo apt-get install -y \
      build-essential \
      curl \
      git \
      libfreetype6-dev \
      libpng12-dev \
      libzmq3-dev \
      pkg-config \
      python-dev \
      python-numpy \
      python-pip \
      software-properties-common \
      swig \
      zip \
      zlib1g-dev[/mw_shl_code]

然后通过运行以下命令来配置TensorFlow：

[mw_shl_code=shell,true]cd tensorflow
./configure
cd ..[/mw_shl_code]

安装服务

使用Git克隆存储库：

[mw_shl_code=shell,true]
git clone --recurse-submodules

https://github.com/tensorflow/servingcd

serving[/mw_shl_code]

需要 - 递归子模块选项来获取TensorFlow服务依赖的TensorFlow，gRPC和其他库。要建立TensorFlow，你必须使用Bazel：

[mw_shl_code=shell,true]bazel build tensorflow_serving/[/mw_shl_code]

二进制文件将被放置在bazel-bin目录中，并且可以使用以下命令运行：

[mw_shl_code=shell,true]/bazel-bin/tensorflow_serving/example/mnist_inference[/mw_shl_code]

最后，您可以通过执行以下命令来测试安装：

[mw_shl_code=shell,true]bazel test tensorflow_serving/[/mw_shl_code]

How to use TensorFlow Serving

在本教程中，我们将介绍to export训练的TensorFlow模型和build a server来为导出的模型提供服务。实施的模型是用于手写图像分类的Softmax回归模型（MNIST数据）。

该代码将由两部分组成：

训练和导出模型的Python文件（mnist_export.py）
一个C ++文件（mnist_inference.cc），用于加载导出的模型并运行gRPC服务来为其提供服务

在下面的部分中，我们将报告使用TensorFlow Serving的基本步骤。对于其他参考，您可以查看https://tensorflow.github.io/serving/serving_basic。

培训并导出TensorFlow模型

正如您在mnist_export.py中看到的一样，训练与MNIST中的训练方式相同。有关初学者教程，请参阅以下链接：

https://www.tensorflow.org/versions/r0.9/tutorials/mnist/beginners/index.html

The TensorFlow graph is launched in TensorFlow session sess, with the input tensor (image) as x and the output tensor (Softmax score) as y.然后我们使用TensorFlow服务出口商导出模型；它构建了训练模型的快照，以便稍后可以加载进行推理。现在让我们看看用来导出训练好的模型的主要功能。

导入导出器以序列化模型：

[mw_shl_code=python,true]from tensorflow_serving.session_bundle import exporter[/mw_shl_code]

然后，您必须使用TensorFlow函数tf.train.Saver来定义saver。它有分片参数等于True：

[mw_shl_code=python,true]saver = tf.train.Saver(sharded=True)[/mw_shl_code]

saver用于将图形变量值序列化到模型导出，以便稍后可以正确恢复。

下一步是定义model_exporter：

[mw_shl_code=python,true]model_exporter = exporter.Exporter(saver)
signature = exporter.classification_signature\ (input_tensor=x, scores_tensor=y)
model_exporter.init(sess.graph.as_graph_def(), default_graph_signature=signature)[/mw_shl_code]

model_exporter takes the following two arguments:

sess.graph.as_graph_def() is the protobuf of the graph. 导出会将protobuf序列化到模型导出，以便稍后可以正确地恢复TensorFlow图形。
default_graph_signature =签名指定了模型导出签名。签名指定正在输出哪种类型的模型，以及运行推理时绑定的输入/输出张量。在这种情况下，您可以使用exporter.classification_signature指定模型是分类模型。

最后，我们创建我们的导出：

[mw_shl_code=applescript,true]model_exporter.export(export_path,tf.constant\ (FLAGS.export_version), sess)

model_exporter.export takes the following arguments:[/mw_shl_code]

export_path是导出目录的路径。如果导出不存在，导出将创建该目录。
tf.constant（FLAGS.export_version）是一个指定模型版本的张量。导出相同模型的较新版本时，应该指定较大的整数值。每个版本将被导出到给定路径下的不同子目录。
sess is the TensorFlow session that holds the trained model you are exporting.

运行会话

要导出模型，首先清除导出目录：

[mw_shl_code=shell,true]$>rm -rf /tmp/mnist_model[/mw_shl_code]

Then, using bazel, build the mnist_export example:

[mw_shl_code=shell,true]$>bazel build //tensorflow_serving/example:mnist_export[/mw_shl_code]

最后，你可以运行下面的例子：

[mw_shl_code=shell,true]$>bazel-bin/tensorflow_serving/example/mnist_export /tmp/mnist_model
Training model...
Done training!
Exporting trained model to /tmp/mnist_model
Done exporting![/mw_shl_code]

在导出目录中，我们应该有一个导出每个版本的模型的子目录：

[mw_shl_code=shell,true]$>ls /tmp/mnist_model
00000001[/mw_shl_code]

相应的子目录的默认值是1，因为我们之前指定了tf.constant（FLAGS.export_version）作为模型版本，而FLAGS.export_version 的默认值是1。

每个版本的子目录都包含以下文件：

export.meta is the serialized tensorflow::MetaGraphDef of the model. 它包括模型的图形定义，以及模型的元数据，如签名。
出口 - ????? - 的 - ????? 是保存图形的序列化变量的文件。

[mw_shl_code=shell,true]$>ls /tmp/mnist_model/00000001
checkpoint export-00000-of-00001 export.meta[/mw_shl_code]

Loading and exporting a TensorFlow model

The C++ code for loading the exported TensorFlow model is in the main() function in mnist_inference.cc. 这里我们报道一个摘录；我们不考虑批量的参数。如果要调整最大批处理大小，超时阈值或用于批处理推断的后台线程数，可以通过在BatchingParameters中设置更多值来实现：

[mw_shl_code=python,true]int main(int argc, char** argv)
{
  SessionBundleConfig session_bundle_config;
      . . . Here batching parameters
  std::unique_ptr<SessionBundleFactory> bundle_factory;
  TF_QCHECK_OK(
   SessionBundleFactory::Create(session_bundle_config,
                                    &bundle_factory));
   std::unique_ptr<SessionBundle> bundle(new SessionBundle);
   TF_QCHECK_OK(bundle_factory->CreateSessionBundle(bundle_path,
                                                      &bundle));
   ......
   RunServer(FLAGS_port, std::move(bundle));
   return 0;
}
[/mw_shl_code]

SessionBundle is a component of TensorFlow Serving. 我们来考虑包含文件SessionBundle.h：

[mw_shl_code=python,true]struct SessionBundle {
std::unique_ptr<tensorflow::Session> session;
tensorflow::MetaGraphDef meta_graph_def;
};[/mw_shl_code]

session参数是一个TensorFlow会话，它具有原始图形，并且正确恢复了必要的变量。

SessionBundleFactory::CreateSessionBundle() loads the exported TensorFlow model from bundle_path and creates a SessionBundle object for running inference with the model.

RunServer brings up a gRPC server that exports a single Classify() API.

每个推理请求将按以下步骤进行处理：

验证输入。服务器期望每个推理请求只有一个MNIST格式的图像。
将输入转换为推理输入张量并创建输出张量占位符。
运行推理。

要运行推理，您必须键入以下命令：

[mw_shl_code=shell,true]$>bazel build //tensorflow_serving/example:mnist_inference
$>bazel-bin/tensorflow_serving/example/mnist_inference --port=9000 /tmp/mnist_model/00000001[/mw_shl_code]

Test the server

要测试服务器，我们使用mnist_client.py（https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/mnist_client.py）效用。

该客户端下载MNIST测试数据，将其作为请求发送给服务器，并计算推理错误率。

要运行它，请键入以下命令：

[mw_shl_code=shell,true]$>bazel build //tensorflow_serving/example:mnist_client
$>bazel-bin/tensorflow_serving/example/mnist_client --num_tests=1000
--server=localhost:9000
Inference error rate: 10.5%[/mw_shl_code]

结果证实服务器成功加载并运行训练好的模型。事实上，对于1,000幅图像，10.5％的推理错误率给了我们训练的Softmax模型的91％的准确性。

Summary

我们在本章中介绍了TensorFlow的两个重要特性。 First was the possibility of using the programming model known as GPU computing, with which it becomes possible to speed up the code (for example, the training phase of a neural network). 本章的第二部分致力于描述框架TensorFlow Serving。这是一款面向机器学习模型的高性能开源服务系统，专为生产环境而设计，并针对TensorFlow进行了优化。这个功能强大的框架可以根据实际数据运行多个随时间变化的大规模模型，从而更有效地利用GPU资源，并允许开发人员改进自己的机器学习模型。

来源：usyiyi

作者：一译

原文链接：第六章。 GPU编程和TensorFlow服务

http://usyiyi.cn/documents/getting-started-with-tf/ch6.html

图文精华

TensorFlow教程：6GPU和TensorFlow Serving介绍

推荐 /2