TensorFlow教程：4神经网络入门详解

本帖最后由 fc013 于 2018-2-3 19:41 编辑

问题导读：

1.什么是神经网络？

2.多层感知器是什么？

3.怎样实现多层感知器？

上一篇：TensorFlow教程：3使用Tensorflow实现常用机器学习算法

What are artificial neural networks?

每个单元或节点模拟神经元在生物神经网络中的作用。 Each node, said artificial neuron, has a very simple operation: it becomes active if the total quantity of signal that it receives exceeds its activation threshold, defined by the so-called activation function. 如果一个节点becomes active 每个连接点用作一个过滤器，将消息转换为抑制或兴奋信号，根据各自的特征增加或减少强度。连接点模拟生物突触，并具有根据传输信号的强度来衡量传输信号的强度的基本功能，方法是将它们的权重取决于连接本身。

Single Layer Perceptron

可以使用其他激活函数，最好是非线性函数（如sigmoid函数，我们将在下一节中看到）。网络的学习过程是迭代的：它通过使用一个被称为训练集的选择集略微修改每个学习周期（称为时期）突触权重。在每个周期中，必须修改权重，以使成本函数最小化，这对于所考虑的问题是特定的。最后，当感知器在训练集上被训练时，将在其他输入（测试集）上进行测试，以验证其泛化能力。

The logistic regression

这个算法与我们在第三章，Starting with Machine Learning中看到的典型线性回归没有任何关系，但它是一个算法，可以让我们解决监督分类问题。事实上，为了估计因变量，现在我们利用所谓的逻辑函数或sigmoid。正是由于这个特性，我们称之为算法逻辑回归。 sigmoid函数有以下模式：

乙状结肠功能

正如我们所看到的，因变量严格地取值于0和1之间，这正是我们所需要的。 In the case of logistic regression, we want our function to tell us what's the probability of belonging to a particular element of our class. We recall again that the supervised learning by the neural network is configured as an iterative process of optimization of the weights; these are then modified on the basis of the network's performance of the training set. 事实上，我们的目标是最大限度地减少loss function，这表明网络的行为偏离期望的程度。网络的性能然后在test set上进行验证，这个测试集是由训练以外的图像组成的。

我们要实施的培训的基本步骤如下：

在训练开始时，用随机值初始化权重。
对于训练集的每个元素来说错误即计算出期望的输出与实际输出的差值。这个错误是用来调整权重。
重复该过程，以随机的顺序向网络重新提交训练集的所有示例，直到整个训练集上的误差不小于某个阈值，或者直到达到最大迭代次数。

现在让我们详细看看如何用TensorFlow实现Logistic回归。我们想要解决的问题是对来自MNIST数据集的图像进行分类，正如在第三章中所解释的那样，Starting with Machine Learning是手写数字的数据库。

TensorFlow implementation

要实现TensorFlow，我们需要执行以下步骤：

首先，我们必须导入所有必要的库：[mw_shl_code=python,true]import input_data
import tensorflow as tf
import matplotlib.pyplot as plt[/mw_shl_code]
We use the input_data.read function introduced in Chapter 3, Starting with Machine Learning, in the MNIST dataset section, to upload the images to our problem:[mw_shl_code=python,true]mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)[/mw_shl_code]
然后我们设置训练阶段的时代总数：[mw_shl_code=python,true]training_epochs = 25[/mw_shl_code]
我们还必须定义构建模型所需的其他参数：[mw_shl_code=python,true]learning_rate = 0.01
batch_size = 100
display_step = 1[/mw_shl_code]
现在我们转向模型的构建。

Building the model

将x定义为输入张量；它代表大小为28×28 = 784像素的MNIST数据图像：

[mw_shl_code=python,true]x = tf.placeholder("float", [None, 784]) [/mw_shl_code]

我们记得，我们的问题包括为每个可能的成员类别（从0到9的数字）分配一个概率值。在计算结束时，我们将使用一个概率分布，它给了我们对我们的预测有信心的价值。

So the output we're going to get will be an output tensor with 10 probabilities, each one corresponding to a digit (of course the sum of probabilities must be one):

[mw_shl_code=python,true]y = tf.placeholder("float", [None, 10]) [/mw_shl_code]

为了给每幅图像分配概率，我们将使用所谓的softmax激活函数。

The softmax function is specified in two main steps:

Calculate the evidence that a certain image belongs to a particular class
Convert the evidence into probabilities of belonging to each of the 10 possible classes

为了评估证据，我们首先将权重输入张量定义为W：

[mw_shl_code=python,true]W = tf.Variable(tf.zeros([784, 10]))[/mw_shl_code]

For a given image, we can evaluate the evidence for each class i by simply multiplying the tensor W with the input tensor x. 使用TensorFlow，我们应该有如下的东西：

[mw_shl_code=python,true]evidence = tf.matmul(x, W)[/mw_shl_code]

一般来说，模型包含一个表示偏差的额外参数，表示一定程度的不确定性。在我们的案例中，证据的最终公式如下：

[mw_shl_code=python,true]evidence = tf.matmul(x, W) + b[/mw_shl_code]

It means that for every i (from 0 to 9) we have a Wi matrix elements 784 (28 × 28), where each element j of the matrix is multiplied by the corresponding component j of the input image (784 parts) is added and the corresponding bias element bi.

因此，要定义证据，我们必须定义以下的偏见张量：

[mw_shl_code=python,true]b = tf.Variable(tf.zeros([10]))[/mw_shl_code]

第二步是最后使用softmax函数来获得概率的输出向量，即激活：

[mw_shl_code=python,true]activation = tf.nn.softmax(tf.matmul(x, W) + b)[/mw_shl_code]

TensorFlow的tf.nn.softmax函数提供了来自输入证据张量的基于概率的输出。 Once we implement the model, we can specify the necessary code to find the weights W and biases b network through the iterative training algorithm. 在每次迭代中，训练算法取出训练数据，应用神经网络，并将结果与期望值进行比较。

T0>注意

TensorFlow提供了许多其他的激活功能。请参阅https://www.tensorflow.org/versions/r0.8/api_docs/index.html以获得更好的参考。

为了训练我们的模型，并且知道什么时候我们有好的模型，我们必须定义如何定义模型的准确性。 Our goal is to try to get values of parameters Wand b that minimize the value of the metric that indicates how bad the model is.

不同的度量计算了期望输出与训练数据输出之间的误差程度。 A common measure of error is the mean squared error or the Squared Euclidean Distance. 然而，有一些研究结果表明使用其他指标来这样的神经网络。

在这个例子中，我们使用所谓的交叉熵错误函数。它被定义为：

[mw_shl_code=python,true]cross_entropy = y*tf.lg(activation)[/mw_shl_code]

In order to minimize cross_entropy, we can use the following combination of tf.reduce_mean and tf.reduce_sum to build the cost function:

[mw_shl_code=python,true]cost = tf.reduce_mean\
(-tf.reduce_sum\
(cross_entropy, reduction_indices=1)) [/mw_shl_code]

那么我们必须使用梯度下降优化算法将其最小化：

[mw_shl_code=python,true]optimizer = tf.train.GradientDescentOptimizer\
(learning_rate).minimize(cost)[/mw_shl_code]

几行代码来建立一个神经网络模型！

Launch the session

现在是建立会议并启动我们的神经网络模型的时候了。

我们修复以下列表以便可视化培训课程：

[mw_shl_code=python,true]avg_set = []
epoch_set=[][/mw_shl_code]

然后我们初始化TensorFlow变量：

[mw_shl_code=python,true]init = tf.initialize_all_variables()[/mw_shl_code]

开始会话：

[mw_shl_code=python,true]with tf.Session() as sess:
sess.run(init)[/mw_shl_code]

如前所述，每个时代都是一个训练周期：

[mw_shl_code=python,true] for epoch in range(training_epochs):
avg_cost = 0.
total_batch = int(mnist.train.num_examples/batch_size)[/mw_shl_code]

然后我们循环所有的批次：

[mw_shl_code=python,true]       for i in range(total_batch):
         batch_xs, batch_ys = \
                        mnist.train.next_batch(batch_size)[/mw_shl_code]

使用批处理数据进行培训：

[mw_shl_code=python,true]
sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys})[/mw_shl_code]

用给定图像值（x）和实际输出（y _）计算运行train_step函数的平均损失：

[mw_shl_code=python,true]          avg_cost += sess.run\
                     (cost, feed_dict={x: batch_xs,\
                              y: batch_ys})/total_batch[/mw_shl_code]

在计算过程中，我们显示每个时期的日志步骤：

[mw_shl_code=python,true]       if epoch % display_step == 0:
         print "Epoch:",\
               '%04d' % (epoch+1),\
               "cost=","{:.9f}".format(avg_cost)          print " Training phase finished"[/mw_shl_code]

让我们来看看我们的模式的准确性。如果具有最高的y值的索引与实际的数字矢量中的相同，那么correct_prediction的平均值给我们的准确性是正确的。我们需要使用我们的测试集（mnist.test）运行精度函数。

我们使用x和y的关键图像和标签：

[mw_shl_code=python,true] correct_prediction = tf.equal\
                        (tf.argmax(activation, 1),\
                        tf.argmax(y, 1))
accuracy = tf.reduce_mean\
                     (tf.cast(correct_prediction, "float"))             print "MODEL accuracy:", accuracy.eval({x: mnist.test.images,\                                  y: mnist.test.labels})[/mw_shl_code]

我们之前展示了培训阶段，并且对于每个时代我们都印刷了相对成本函数：

[mw_shl_code=python,true]Python 2.7.10 (default, Oct 14 2015, 16:09:02) [GCC 5.2.1 20151010] on linux2 Type "copyright", "credits" or "license()" for more information. >>> ======================= RESTART ============================
>>>
Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
Epoch: 0001 cost= 1.174406662
Epoch: 0002 cost= 0.661956009
Epoch: 0003 cost= 0.550468774
Epoch: 0004 cost= 0.496588717
Epoch: 0005 cost= 0.463674555
Epoch: 0006 cost= 0.440907706
Epoch: 0007 cost= 0.423837747
Epoch: 0008 cost= 0.410590841
Epoch: 0009 cost= 0.399881751
Epoch: 0010 cost= 0.390916621
Epoch: 0011 cost= 0.383320325
Epoch: 0012 cost= 0.376767031
Epoch: 0013 cost= 0.371007620
Epoch: 0014 cost= 0.365922904
Epoch: 0015 cost= 0.361327561
Epoch: 0016 cost= 0.357258660
Epoch: 0017 cost= 0.353508228
Epoch: 0018 cost= 0.350164634
Epoch: 0019 cost= 0.347015593
Epoch: 0020 cost= 0.344140861
Epoch: 0021 cost= 0.341420144
Epoch: 0022 cost= 0.338980592
Epoch: 0023 cost= 0.336655581
Epoch: 0024 cost= 0.334488012
Epoch: 0025 cost= 0.332488823
Training phase finished[/mw_shl_code]

最后，使用以下几行代码，我们可以看到网络的训练阶段：

[mw_shl_code=python,true]plt.plot(epoch_set,avg_set, 'o',\
label='Logistic Regression Training phase')
plt.ylabel('cost')
plt.xlabel('epoch')
plt.legend()
plt.show()[/mw_shl_code]

逻辑回归的训练阶段

Source code

[mw_shl_code=python,true]# Import MINST data
import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
import tensorflow as tf
import matplotlib.pyplot as plt
# Parameters
learning_rate = 0.01
training_epochs = 25
batch_size = 100
display_step = 1
# tf Graph Input
x = tf.placeholder("float", [None, 784])
y = tf.placeholder("float", [None, 10])
# Create model
# Set model weights
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
# Construct model
activation = tf.nn.softmax(tf.matmul(x, W) + b)
# Minimize error using cross entropy
cross_entropy = y*tf.log(activation)
cost = tf.reduce_mean\
   (-tf.reduce_sum\
      (cross_entropy,reduction_indices=1))
optimizer = tf.train.\
         GradientDescentOptimizer(learning_rate).minimize(cost)
#Plot settings
avg_set = []
epoch_set=[]
# Initializing the variables
init = tf.initialize_all_variables()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(training_epochs):
      avg_cost = 0.
      total_batch = int(mnist.train.num_examples/batch_size)
      # Loop over all batches
      for i in range(total_batch):
         batch_xs, batch_ys = \
                  mnist.train.next_batch(batch_size)
         # Fit training using batch data
         sess.run(optimizer, \
                  feed_dict={x: batch_xs, y: batch_ys})
         # Compute average loss
         avg_cost += sess.run(cost,feed_dict=\
                                 {x: batch_xs,\
                                    y: batch_ys})/total_batch
      # Display logs per epoch step
      if epoch % display_step == 0:
         print "Epoch:", '%04d' % (epoch+1),\
               "cost=", "{:.9f}".format(avg_cost)
      avg_set.append(avg_cost)
      epoch_set.append(epoch+1)
print "Training phase finished"
plt.plot(epoch_set,avg_set, 'o',\
         label='Logistic Regression Training phase')
plt.ylabel('cost')
plt.xlabel('epoch')
plt.legend()
plt.show()
# Test model
correct_prediction = tf.equal\
                     (tf.argmax(activation, 1),\
                     tf.argmax(y, 1))
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print "Model accuracy:", accuracy.eval({x: mnist.test.images,\
                                          y: mnist.test.labels})[/mw_shl_code]

Multi Layer Perceptron

A more complex and efficient architecture is that of Multi Layer Perceptron (MLP). It is substantially formed from multiple layers of perceptrons, and therefore by the presence of at least one hidden 层，那是 not connected 无论是输入还是网络输出：

MLP架构

这种类型的网络通常使用监督学习进行训练，根据前一段所述的原则。具体而言，MLP网络的典型学习算法是所谓的反向传播算法。

T0>注意

反向传播算法是神经网络的学习算法。它将系统的输出值与期望值进行比较。根据这样计算的差值（即误差），该算法通过渐进地收敛所希望的一组输出值来修改神经网络的突触权重。

重要的是要注意，在MLP网络中，虽然你不知道网络隐层的神经元的期望输出，但总是可以应用基于最小化误差函数的监督学习方法应用梯度下降技术。

在下面的例子中，我们用MLP来展示图像分类问题（MNIST）的实现。

Multi Layer Perceptron classification

导入必要的库：

[mw_shl_code=python,true]import input_data
import tensorflow as tf
import matplotlib.pyplot as plt[/mw_shl_code]

加载图像进行分类：

[mw_shl_code=python,true]mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)[/mw_shl_code]

修复MLP模型的一些参数：

网络学习率：

[mw_shl_code=python,true]learning_rate = 0.001[/mw_shl_code]

时代：

[mw_shl_code=python,true]training_epochs = 20[/mw_shl_code]

要分类的图像数量：

[mw_shl_code=python,true]batch_size = 100
display_step = 1[/mw_shl_code]

第一层神经元的数量：

[mw_shl_code=python,true]n_hidden_1 = 256 [/mw_shl_code]

第二层神经元的数量：

[mw_shl_code=python,true]n_hidden_2 = 256 [/mw_shl_code]

The size of the input (each image has 784 pixels):

[mw_shl_code=python,true]n_input = 784 # MNIST data input (img shape: 28*28)[/mw_shl_code]

输出类的大小：

[mw_shl_code=python,true]n_classes = 10[/mw_shl_code]

因此，应该注意的是，虽然对于给定的应用，输入和输出尺寸是完美定义的，但是对于如何定义每层的隐藏层数和神经元数没有严格的标准。

每一个选择都必须基于类似应用的经验，就像我们的情况一样：

在增加隐层数量的同时，在学习阶段也应增加必要的训练集的大小，并增加更新的连接数。这会导致训练时间increase。
Also, if there are too many neurons in the hidden layer, not only are there more weights to be updated but the network also has a tendency to learn too much from the training examples set, resulting in a poor generalization ability. 但是，如果隐藏的神经元数量太少，那么即使训练集合，网络is not able to learn

建立模型

输入层是x张量[1×784]，它表示要分类的图像：

[mw_shl_code=python,true]x = tf.placeholder("float", [None, n_input])[/mw_shl_code]

输出张量y等于类的数量：

[mw_shl_code=python,true]y = tf.placeholder("float", [None, n_classes])[/mw_shl_code]

在中间，我们有两个隐藏层。 The first layer is constituted by the h tensor of weights, whose size is [784 × 256], where 256 is the total number of nodes of the layer:

[mw_shl_code=python,true]h = tf.Variable(tf.random_normal([n_input, n_hidden_1]))[/mw_shl_code]

对于第1层，所以我们必须定义各自的偏差张量：

[mw_shl_code=python,true]bias_layer_1 = tf.Variable(tf.random_normal([n_hidden_1]))[/mw_shl_code]

每个神经元接收要分类的输入图像的像素，与hij权重连接相结合，并添加到偏差张量的相应值：

[mw_shl_code=applescript,true]layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x,h),bias_layer_1))[/mw_shl_code]

它通过激活函数将其输出发送到下一层的神经元。必须说，功能可以是不同的，从一个神经元到另一个，但是在实践中，然而，我们采用所有的神经元，通常是S型的共同特征。有时输出神经元配有线性激活功能。有趣的是，隐藏层中的神经元的激活函数不能是线性的，因为在这种情况下，MLP网络将等同于具有两层的网络，因此不再是MLP类型的网络。第二层必须执行the same steps as the first相同的步骤。

The second intermediate layer is represented by the shape of the weights tensor [256 × 256]:

[mw_shl_code=python,true]w = tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2]))[/mw_shl_code]

随着偏见的张量：

[mw_shl_code=python,true]bias_layer_2 = tf.Variable(tf.random_normal([n_hidden_2]))[/mw_shl_code]

Each neuron in this second layer receives inputs from the neurons of layer 1, combined with the weight Wij connections and added to the respective biases of layer 2:

[mw_shl_code=python,true]layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1,w),bias_layer_2))[/mw_shl_code]

它将其输出发送到下一层，即输出层：

[mw_shl_code=python,true]output = tf.Variable(tf.random_normal([n_hidden_2, n_classes]))

bias_output = tf.Variable(tf.random_normal([n_classes]))

output_layer = tf.matmul(layer_2, output) + bias_output[/mw_shl_code]

输出层接收作为输入的来自层2的n个刺激（256），其被转换为每个数字的相应类别概率。

至于逻辑回归，我们接下来定义了成本函数：

[mw_shl_code=python,true]cost = tf.reduce_mean\
(tf.nn.softmax_cross_entropy_with_logits\
(output_layer, y))[/mw_shl_code]

The TensorFlow function tf .nn.softmax_cross_entropy_with_logits computes the cost for a softmax layer. 它只在训练中使用。 logit是输出模型的非标准化的对数概率（在将softmax标准化应用于它们之前输出的值）。

使cost函数最小化的相应优化器是：

[mw_shl_code=python,true]optimizer = tf.train.AdamOptimizer\
(learning_rate=learning_rate).minimize(cost) [/mw_shl_code]

tf.train.AdamOptimizer uses Kingma and Ba's Adam algorithm to control the learning rate. Adam比简单的tf.train.GradientDescentOptimizer提供了几个优点。事实上，它使用更大的有效步长，并且算法将会收敛到这个步长而不需要微调。

一个简单的tf.train.GradientDescentOptimizer同样可以用在你的MLP中，但是在它能够快速收敛之前需要更多的超参数调整。

T0>注意

TensorFlow提供优化器基类来计算损失的梯度并将梯度应用于变量。这个类定义了API来添加操作来训练一个模型。你从不直接使用这个类，而是实例化一个子类。请参阅https://www.tensorflow.org/versions/r0.8/api_docs/python/train.html#Optimizer以查看实现的优化器。

启动会话

以下是启动会话的步骤：

绘制设置：[mw_shl_code=python,true]avg_set = []
epoch_set=[][/mw_shl_code]
初始化变量：[mw_shl_code=python,true]init = tf.initialize_all_variables()[/mw_shl_code]
启动图表：[mw_shl_code=python,true]with tf.Session() as sess:
sess.run(init)[/mw_shl_code]
定义培训周期：[mw_shl_code=python,true] for epoch in range(training_epochs):
avg_cost = 0.
total_batch = int(mnist.train.num_examples/batch_size)[/mw_shl_code]
循环所有批次（100）：[mw_shl_code=python,true] for i in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)[/mw_shl_code]
使用批量数据进行适合的培训：[mw_shl_code=python,true] sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys})[/mw_shl_code]
计算平均损失：[mw_shl_code=python,true]          avg_cost += sess.run(cost,feed_dict={x: batch_xs,\
            y: batch_ys})/total_batch
Display logs per epoch step
      if epoch % display_step == 0:
         print "Epoch:", '%04d' % (epoch+1),\
      "cost=", "{:.9f}".format(avg_cost)
      avg_set.append(avg_cost)
      epoch_set.append(epoch+1)
print "Training phase finished"[/mw_shl_code]
用这些代码行，我们绘制了训练阶段：[mw_shl_code=python,true] plt.plot(epoch_set,avg_set, 'o', label='MLP Training phase')
plt.ylabel('cost')
plt.xlabel('epoch')
plt.legend()
plt.show()[/mw_shl_code]
最后，我们可以测试MLP模型：[mw_shl_code=python,true] correct_prediction = tf.equal(tf.argmax(output_layer, 1),\
tf.argmax(y, 1))
evaluating its accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print "Model Accuracy:", accuracy.eval({x: mnist.test.images,\
y: mnist.test.labels})[/mw_shl_code]
这是20个纪元后的输出结果：[mw_shl_code=python,true]Python 2.7.10 (default, Oct 14 2015, 16:09:02) [GCC 5.2.1 20151010] on linux2 Type "copyright", "credits" or "license()" for more information.
>>> ========================== RESTART ==============================
>>>
Succesfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/data/train-images-idx3-ubyte.gz
Succesfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Succesfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Succesfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
Epoch: 0001 cost= 1.723947845
Epoch: 0002 cost= 0.539266024
Epoch: 0003 cost= 0.362600502
Epoch: 0004 cost= 0.266637279
Epoch: 0005 cost= 0.205345784
Epoch: 0006 cost= 0.159139332
Epoch: 0007 cost= 0.125232637
Epoch: 0008 cost= 0.098572041
Epoch: 0009 cost= 0.077509963
Epoch: 0010 cost= 0.061127526
Epoch: 0011 cost= 0.048033808
Epoch: 0012 cost= 0.037297983
Epoch: 0013 cost= 0.028884999
Epoch: 0014 cost= 0.022818390
Epoch: 0015 cost= 0.017447586
Epoch: 0016 cost= 0.013652348
Epoch: 0017 cost= 0.010417282
Epoch: 0018 cost= 0.008079228
Epoch: 0019 cost= 0.006203546
Epoch: 0020 cost= 0.004961207
Training phase finished
Model Accuracy: 0.9775
>>>[/mw_shl_code]

我们在下图中显示培训阶段：

多层感知器的训练阶段

源代码

[mw_shl_code=python,true]# Import MINST data
import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
import tensorflow as tf
import matplotlib.pyplot as plt
# Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 100
display_step = 1
# Network Parameters
n_hidden_1 = 256 # 1st layer num features
n_hidden_2 = 256 # 2nd layer num features
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
#weights layer 1
h = tf.Variable(tf.random_normal([n_input, n_hidden_1]))
#bias layer 1
bias_layer_1 = tf.Variable(tf.random_normal([n_hidden_1]))
#layer 1
layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x,h),bias_layer_1))
#weights layer 2
w = tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2]))
#bias layer 2
bias_layer_2 = tf.Variable(tf.random_normal([n_hidden_2]))
#layer 2
layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1,w),bias_layer_2))
#weights output layer
output = tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
#biar output layer
bias_output = tf.Variable(tf.random_normal([n_classes]))
#output layer
output_layer = tf.matmul(layer_2, output) + bias_output
# cost function
cost = tf.reduce_mean\
(tf.nn.softmax_cross_entropy_with_logits(output_layer, y))
# optimizer
optimizer = tf.train.AdamOptimizer\
   (learning_rate=learning_rate).minimize(cost)
#Plot settings
avg_set = []
epoch_set=[]
# Initializing the variables
init = tf.initialize_all_variables()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(training_epochs):
      avg_cost = 0.
      total_batch = int(mnist.train.num_examples/batch_size)
      # Loop over all batches
      for i in range(total_batch):
         batch_xs, batch_ys = mnist.train.next_batch(batch_size)
         # Fit training using batch data
         sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys})
         # Compute average loss
         avg_cost += sess.run(cost, \
      feed_dict={x: batch_xs,\
            y: batch_ys})/total_batch
      # Display logs per epoch step
      if epoch % display_step == 0:
         print "Epoch:", '%04d' % (epoch+1),\
      "cost=", "{:.9f}".format(avg_cost)
      avg_set.append(avg_cost)
      epoch_set.append(epoch+1)
print "Training phase finished"
plt.plot(epoch_set,avg_set, 'o', label='MLP Training phase')
plt.ylabel('cost')
plt.xlabel('epoch')
plt.legend()
plt.show()
# Test model
correct_prediction = tf.equal(tf.argmax(output_layer, 1),\
      tf.argmax(y, 1))
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print "Model Accuracy:", accuracy.eval({x: mnist.test.images,\                   y: mnist.test.labels})[/mw_shl_code]

Multi Layer Perceptron function approximation

在下面的例子中，我们实现了一个能够学习任意函数f（x）的趋势的MLP网络。在训练阶段，网络将不得不从已知的一组点中学习，即x和f（x），而在测试阶段，网络将扣除仅仅从x值的f（x）的值。

这个非常简单的网络将由一个隐藏层构建。

导入必要的库：

[mw_shl_code=python,true]import tensorflow as tf
import numpy as np
import math, random
import matplotlib.pyplot as plt[/mw_shl_code]

我们建立数据模型。要学习的函数将遵循余弦函数的趋势，对1000 点进行评估，我们添加一个非常小的随机误差（噪声）来重现实际情况：

[mw_shl_code=python,true]NUM_points = 1000
np.random.seed(NUM_points)
function_to_learn = lambda x: np.cos(x) + \
0.1*np.random.randn(*x.shape)[/mw_shl_code]

我们的MLP网络将由一个10神经元的隐藏层形成：

[mw_shl_code=python,true]layer_1_neurons = 10
The network learns for 100 points at a time to a total of 1500 learning cycles (epochs):

batch_size = 100
NUM_EPOCHS = 1500[/mw_shl_code]

最后，我们构造训练集和测试集：

[mw_shl_code=python,true]all_x contiene tutti i punti
all_x = np.float32(np.random.uniform\
(-2*math.pi, 2*math.pi,\
(1, NUM_points))).T
np.random.shuffle(all_x)
train_size = int(900)[/mw_shl_code]

The first 900 points are in the training set:

[mw_shl_code=python,true]x_training = all_x[:train_size]
y_training = function_to_learn(x_training)[/mw_shl_code]

最后一个100将在验证集中：

[mw_shl_code=python,true]x_validation = all_x[train_size:]
y_validation = function_to_learn(x_validation)[/mw_shl_code]

使用matplotlib，我们显示这些集合：

[mw_shl_code=python,true]plt.figure(1)
plt.scatter(x_training, y_training, c='blue', label='train')
plt.scatter(x_validation, y_validation,c='red',label='validation')
plt.legend()
plt.show()[/mw_shl_code]

培训和验证集

建立模型

首先，我们为输入张量（X）和输出张量（Y）创建占位符：

[mw_shl_code=python,true]X = tf.placeholder(tf.float32, [None, 1], name="X")
Y = tf.placeholder(tf.float32, [None, 1], name="Y")[/mw_shl_code]

然后我们建立[1 x 10]维度的隐藏层：

[mw_shl_code=python,true]w_h = tf.Variable(tf.random_uniform([1, layer_1_neurons],\
                                 minval=-1, maxval=1, \
                                                                                             dtype=tf.float32))
b_h = tf.Variable(tf.zeros([1, layer_1_neurons], \
                        dtype=tf.float32))[/mw_shl_code]

It receives the input value from the X input tensor, combined with the weight w_hij connections and added with the respective biases of layer 1:

[mw_shl_code=python,true]h = tf.nn.sigmoid(tf.matmul(X, w_h) + b_h)[/mw_shl_code]

输出层是一个[10×1]张量：

[mw_shl_code=python,true]w_o = tf.Variable(tf.random_uniform([layer_1_neurons, 1],\
minval=-1, maxval=1,\
dtype=tf.float32))
b_o = tf.Variable(tf.zeros([1, 1], dtype=tf.float32))[/mw_shl_code]

第二层中的每个神经元接收来自层1的神经元的输入，并与权重w_oij 连接结合，并与输出层的相应偏差一起加入：

[mw_shl_code=python,true]model = tf.matmul(h, w_o) + b_o[/mw_shl_code]

然后我们为新定义的模型定义我们的优化器：

[mw_shl_code=python,true]train_op = tf.train.AdamOptimizer().minimize\
(tf.nn.l2_loss(model - Y))[/mw_shl_code]

我们也注意到，在这种情况下，所采用的成本函数如下：

[mw_shl_code=python,true]tf.nn.l2_loss(model - Y)[/mw_shl_code]

tf.nn.l2_loss函数是一个TensorFlow，它计算一个张量的L2范数的一半，而不需要sqrt，也就是前面函数的输出如下：

[mw_shl_code=python,true] output = sum((model - Y) ** 2) / 2[/mw_shl_code]

The tf.nn.l2_loss function can be a viable cost function for our example.

启动会话

我们来构建评估图：

[mw_shl_code=python,true]sess = tf.Session()
sess.run(tf.initialize_all_variables())[/mw_shl_code]

现在我们可以开始学习会议：

[mw_shl_code=python,true]errors = []
for i in range(NUM_EPOCHS):
for start, end in zip(range(0, len(x_training), batch_size),\
                        range(batch_size,\
                              len(x_training), batch_size)):
      sess.run(train_op, feed_dict={X: x_training[start:end],\
                                    Y: y_training[start:end]})
cost = sess.run(tf.nn.l2_loss(model - y_validation),\
                  feed_dict={X:x_validation})
errors.append(cost)
if i%100 == 0: print "epoch %d, cost = %g" % (i, cost)[/mw_shl_code]

运行这个网络1400个时代，我们会看到这个错误逐渐减少并最终收敛：

[mw_shl_code=python,true]Python 2.7.10 (default, Oct 14 2015, 16:09:02) [GCC 5.2.1 20151010] on linux2 Type "copyright", "credits" or "license()" for more information.
>>> ======================= RESTART ============================
>>>
epoch 0, cost = 55.9286
epoch 100, cost = 22.0084
epoch 200, cost = 18.033
epoch 300, cost = 14.0481
epoch 400, cost = 9.74721
epoch 500, cost = 5.83419
epoch 600, cost = 3.05434
epoch 700, cost = 1.53706
epoch 800, cost = 0.91719
epoch 900, cost = 0.726675
epoch 1000, cost = 0.668316
epoch 1100, cost = 0.633737
epoch 1200, cost = 0.608306
epoch 1300, cost = 0.590429
epoch 1400, cost = 0.574602
>>>[/mw_shl_code]

以下代码行让我们可以显示运行时期成本如何变化：

[mw_shl_code=python,true]plt.plot(errors,label='MLP Function Approximation')
plt.xlabel('epochs')
plt.ylabel('cost')
plt.legend()
plt.show()[/mw_shl_code]

多层感知器的训练阶段

Summary

在本章中，我们介绍了artificial neural networks。人造神经元是在一定程度上模仿活神经元的特性的数学模型。网络中的每个神经元都有一个非常简单的操作，如果它接收到的信号总量超过激活阈值，它就变成有效的。 The learning process is typically supervised: the neural net uses a training set to infer the relationship between the input and the corresponding output, while the learning algorithm modifies the weights of the net in order to minimize a cost function that represents the forecast error relating to the training set. 如果训练是成功的，神经网络将能够做出预测，即使输出未知的先验。在本章中，我们使用TensorFlow实现了一些涉及神经网络的例子。 We have seen neural nets used to solve classification and regressions problems as the logistic regression algorithm in a classification problem using the Rosemblatt's Perceptron.在本章最后，我们介绍了我们在实现image classifier之前看到的Multi Layer Perceptron体系结构，然后介绍了simulator of mathematical functions。

来源：usyiyi

作者：一译

原文链接：第四章。引入神经网络

http://usyiyi.cn/documents/getting-started-with-tf/ch4.html

图文精华

TensorFlow教程：4神经网络入门详解

推荐 /2