TensorFlow ML cookbook 第九章2节实施LSTM模型

本帖最后由 levycui 于 2019-11-5 19:29 编辑
问题导读：
1、LSTM如何解决可变长度RNN具有的消失/爆炸梯度问题？
2、如何创建一个函数来返回两个字典？
3、如何声明LSTM模型以及测试模型？
4、如何使用thenumpy.roll（）函数？

上一篇:TensorFlow ML cookbook 第九章1节为垃圾邮件预测实施RNN

实施LSTM模型

通过在本配方中引入LSTM单元，我们将扩展RNN模型以使用更长的序列。

做好准备
长短期记忆（LSTM）是传统RNN的一种变体.LSTM是解决可变长度RNN具有的消失/爆炸梯度问题的一种方法。为了解决此问题，LSTM单元引入了一个内部忘记门，该门可以修改信息从一个单元流向另一个单元。为了概念化它的工作原理，我们将一次遍历一个模型的无偏版本，第一步与常规RNN相同：

LSTM的想法是使信息通过细胞具有自我调节能力，并且可以根据输入到细胞的信息来忘记或修改这些信息。

在本食谱中，我们将使用带有LSTM单元的序列RNN来尝试预测接下来的单词，这些单词将在莎士比亚的作品中进行训练。为了测试我们的工作方式，我们将提供模型候选短语，例如，您还可以并查看模型是否可以尝试找出该词组后面应包含哪些词。

怎么做…
1.首先，我们为脚本加载必要的库：
[mw_shl_code=python,true]import os
import re
import string
import requests
import numpy as np
import collections
import random
import pickle
import matplotlib.pyplot as plt
import tensorflow as tf [/mw_shl_code]

2接下来，我们开始一个图形会话并设置RNN参数sess = tf.Session（）
[mw_shl_code=python,true]# Set RNN Parameters
min_word_freq = 5
rnn_size = 128
epochs = 10
batch_size = 100
learning_rate = 0.001
training_seq_len = 50
embedding_size = rnn_size
save_every = 500
eval_every = 50
prime_texts = ['thou art more', 'to be or not to', 'wherefore art thou'] [/mw_shl_code]

3，设置数据，模型文件夹和文件名，并声明要删除的标点符号。由于莎士比亚经常使用连字符和撇号来组合单词和音节，因此我们希望保留连字符和撇号：
[mw_shl_code=python,true]data_dir = 'temp'
data_file = 'shakespeare.txt'
model_path = 'shakespeare_model'
full_model_dir = os.path.join(data_dir, model_path)
# Declare punctuation to remove, everything except hyphens and apostrophes
punctuation = string.punctuation
punctuation = ''.join([x for x in punctuation if x not in ['-', "'"]])[/mw_shl_code]

4.接下来我们得到数据。如果数据文件不存在，我们将下载并保存莎士比亚文本。如果确实存在，我们将加载数据：[mw_shl_code=python,true]if not os.path.exists(full_model_dir):
os.makedirs(full_model_dir)
# Make data directory
if not os.path.exists(data_dir):
  os.makedirs(data_dir)
  print('Loading Shakespeare Data')
# Check if file is downloaded.
if not os.path.isfile(os.path.join(data_dir, data_file)):
  print('Not found, downloading Shakespeare texts from www. gutenberg.org')
  shakespeare_url = 'http://www.gutenberg.org/cache/epub/100/ pg100.txt'
# Get Shakespeare text
  response = requests.get(shakespeare_url)
  shakespeare_file = response.content
# Decode binary into string
  s_text = shakespeare_file.decode('utf-8')
# Drop first few descriptive paragraphs.
  s_text = s_text[7675:]
# Remove newlines
  s_text = s_text.replace('\r\n', '')
  s_text = s_text.replace('\n', '')
# Write to file
with open(os.path.join(data_dir, data_file), 'w') as out_conn:
  out_conn.write(s_text)
else:
# If file has been saved, load from that file
  with open(os.path.join(data_dir, data_file), 'r') as file_ conn:
s_text = file_conn.read().replace('\n', '')
[/mw_shl_code]
5.我们通过删除标点符号和多余的空格来清除莎士比亚的文本：
[mw_shl_code=python,true]s_text = re.sub(r'[{}]'.format(punctuation), ' ', s_text)
s_text = re.sub('\s+', ' ', s_text ).strip().lower() [/mw_shl_code]

6，我们现在要创建莎士比亚词汇表来使用，我们创建一个函数来返回两个字典（单词到索引和索引到单词），并且出现频率超过指定频率：
[mw_shl_code=python,true]def build_vocab(text, min_word_freq):
word_counts = collections.Counter(text.split(' '))

# limit word counts to those more frequent than cutoff
word_counts = {key:val for key, val in word_counts.items() if val>min_word_freq}
# Create vocab --> index mapping
words = word_counts.keys()
vocab_to_ix_dict = {key:(ix+1) for ix, key in enumerate(words)}
# Add unknown key --> 0 index
vocab_to_ix_dict['unknown']=0
# Create index --> vocab mapping
ix_to_vocab_dict = {val:key for key,val in vocab_to_ix_dict. items()}
return(ix_to_vocab_dict, vocab_to_ix_dict)
ix2vocab, vocab2ix = build_vocab(s_text, min_word_freq)
vocab_size = len(ix2vocab) + 1[/mw_shl_code]

7，现在我们有了词汇表，我们将莎士比亚文本变成索引数组：
[mw_shl_code=python,true]s_text_words = s_text.split(' ')
s_text_ix = []
for ix, x in enumerate(s_text_words):
  try:
s_text_ix.append(vocab2ix[x])
  except:
s_text_ix.append(0)
s_text_ix = np.array(s_text_ix) [/mw_shl_code]

8，在本食谱中，我们将展示如何在类对象中创建模型，这将对我们有所帮助，因为我们希望使用相同的模型（具有相同的权重）进行批量训练并从中生成文本示例文本。如果没有带有内部采样方法的类，这将很难做到。理想情况下，此类代码应位于单独的Python文件中，我们可以在此脚本的开头导入该文件：
[mw_shl_code=python,true]class LSTM_Model():
def __init__(self, rnn_size, batch_size, learning_rate,
  training_seq_len, vocab_size, infer =False):
  self.rnn_size = rnn_size
  self.vocab_size = vocab_size
  self.infer = infer

self.learning_rate = learning_rate
if infer:
  self.batch_size = 1
  self.training_seq_len = 1
else:
  self.batch_size = batch_size
  self.training_seq_len = training_seq_len
  self.lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(rnn_size)
  self.initial_state = self.lstm_cell.zero_state(self.batch_ size, tf.float32)
  self.x_data = tf.placeholder(tf.int32, [self.batch_size, self.training_seq_len])
  self.y_output = tf.placeholder(tf.int32, [self.batch_size, self.training_seq_len])
with tf.variable_scope('lstm_vars'):
# Softmax Output Weights
  W = tf.get_variable('W', [self.rnn_size, self.vocab_ size], tf.float32, tf.random_normal_initializer())
  b = tf.get_variable('b', [self.vocab_size], tf.float32, tf.constant_initializer(0.0))
# Define Embedding
  embedding_mat = tf.get_variable('embedding_mat', [self.vocab_size, self.rnn_size], tf.float32, tf.random_normal_ initializer())
  embedding_output = tf.nn.embedding_lookup(embedding_ mat, self.x_data)
  rnn_inputs = tf.split(1, self.training_seq_len, embedding_output)
  rnn_inputs_trimmed = [tf.squeeze(x, [1]) for x in rnn_ inputs]
# If we are inferring (generating text), we add a 'loop' function
# Define how to get the i+1 th input from the i th output
def inferred_loop(prev, count):
  prev_transformed = tf.matmul(prev, W) + b
  prev_symbol = tf.stop_gradient(tf.argmax(prev_ transformed, 1))

output = tf.nn.embedding_lookup(embedding_mat, prev_ symbol)
return(output)
decoder = tf.nn.seq2seq.rnn_decoder
outputs, last_state = decoder(rnn_inputs_trimmed,
self.initial_state,
self.lstm_cell,
loop_function=inferred_loop if infer else None)
# Non inferred outputs
output = tf.reshape(tf.concat(1, outputs), [-1, self.rnn_ size])
# Logits and output
self.logit_output = tf.matmul(output, W) + b
self.model_output = tf.nn.softmax(self.logit_output)
loss_fun = tf.nn.seq2seq.sequence_loss_by_example
loss = loss_fun([self.logit_output],[tf.reshape(self.y_ output, [-1])],
[tf.ones([self.batch_size * self.training_seq_ len])],
self.vocab_size)
self.cost = tf.reduce_sum(loss) / (self.batch_size * self. training_seq_len)
self.final_state = last_state
gradients, _ = tf.clip_by_global_norm(tf.gradients(self. cost, tf.trainable_variables()), 4.5)
optimizer = tf.train.AdamOptimizer(self.learning_rate)
self.train_op = optimizer.apply_gradients(zip(gradients, tf.trainable_variables()))
def sample(self, sess, words=ix2vocab, vocab=vocab2ix, num=10, prime_text='thou art'):
state = sess.run(self.lstm_cell.zero_state(1, tf.float32))
word_list = prime_text.split()
for word in word_list[:-1]:
  x = np.zeros((1, 1))
  x[0, 0] = vocab[word]
  feed_dict = {self.x_data: x, self.initial_state:state}
  [state] = sess.run([self.final_state], feed_dict=feed_ dict)
  out_sentence = prime_text
  word = word_list[-1]

for n in range(num):
  x = np.zeros((1, 1))
  x[0, 0] = vocab[word]
  feed_dict = {self.x_data: x, self.initial_state:state}
  [model_output, state] = sess.run([self.model_output, self.final_state], feed_dict=feed_dict)
  sample = np.argmax(model_output[0])
  if sample == 0:
break
word = words[sample]
out_sentence = out_sentence + ' ' + word
return(out_sentence) [/mw_shl_code]

9.现在，我们将声明LSTM模型以及测试模型。我们将在变量范围内执行此操作，并告诉范围我们将对测试LSTM模型重新使用变量：
[mw_shl_code=python,true]with tf.variable_scope('lstm_model') as scope:
# Define LSTM Model
lstm_model = LSTM_Model(rnn_size, batch_size, learning_rate,
training_seq_len, vocab_size)
scope.reuse_variables()
test_lstm_model = LSTM_Model(rnn_size, batch_size, learning_ rate,
training_seq_len, vocab_size, infer=True) [/mw_shl_code]

10，我们创建一个保存操作，并将输入文本分成相等的批处理大小的块，然后我们将初始化模型的变量：
[mw_shl_code=python,true]saver = tf.train.Saver()
# Create batches for each epoch
num_batches = int(len(s_text_ix)/(batch_size * training_seq_len)) + 1
# Split up text indices into subarrays, of equal size
batches = np.array_split(s_text_ix, num_batches)
# Reshape each split into [batch_size, training_seq_len]
batches = [np.resize(x, [batch_size, training_seq_len]) for x in batches]
# Initialize all variables
init = tf.initialize_all_variables()
sess.run(init) [/mw_shl_code]

11，现在我们可以遍历各个纪元，在每个纪元开始之前对数据进行改组，我们的数据目标只是相同的数据，但是移位了一个值（使用thenumpy.roll（）函数）：
[mw_shl_code=python,true]train_loss = []
iteration_count = 1

for epoch in range(epochs):
# Shuffle word indices
random.shuffle(batches)
# Create targets from shuffled batches
targets = [np.roll(x, -1, axis=1) for x in batches]
# Run a through one epoch
print('Starting Epoch #{} of {}.'.format(epoch+1, epochs))
# Reset initial LSTM state every epoch
state = sess.run(lstm_model.initial_state)
for ix, batch in enumerate(batches):
training_dict = {lstm_model.x_data: batch, lstm_model.y_ output: targets[ix]}
c, h = lstm_model.initial_state
training_dict[c] = state.c
training_dict[h] = state.h
temp_loss, state, _ = sess.run([lstm_model.cost, lstm_ model.final_state, lstm_model.train_op], feed_dict=training_dict)
train_loss.append(temp_loss)
# Print status every 10 gens
if iteration_count % 10 == 0:
summary_nums = (iteration_count, epoch+1, ix+1, num_ batches+1, temp_loss)
print('Iteration: {}, Epoch: {}, Batch: {} out of {}, Loss: {:.2f}'.format(*summary_nums))
# Save the model and the vocab
if iteration_count % save_every == 0:
# Save model
model_file_name = os.path.join(full_model_dir, 'model')
saver.save(sess, model_file_name, global_step = iteration_count)
print('Model Saved To: {}'.format(model_file_name))
# Save vocabulary
dictionary_file = os.path.join(full_model_dir, 'vocab. pkl')
with open(dictionary_file, 'wb') as dict_file_conn:
pickle.dump([vocab2ix, ix2vocab], dict_file_conn)
if iteration_count % eval_every == 0:

for sample in prime_texts:
print(test_lstm_model.sample(sess, ix2vocab, vocab2ix, num=10, prime_text=sample))
iteration_count += 1 [/mw_shl_code]

12.这将导致以下输出：
[mw_shl_code=python,true]Loading Shakespeare Data
Cleaning Text
Building Shakespeare Vocab
Vocabulary Length = 8009
Starting Epoch #1 of 10.
Iteration: 10, Epoch: 1, Batch: 10 out of 182, Loss: 10.37
Iteration: 20, Epoch: 1, Batch: 20 out of 182, Loss: 9.54
...
Iteration: 1790, Epoch: 10, Batch: 161 out of 182, Loss: 5.68
Iteration: 1800, Epoch: 10, Batch: 171 out of 182, Loss: 6.05
thou art more than i am a
to be or not to the man i have
wherefore art thou art of the long
Iteration: 1810, Epoch: 10, Batch: 181 out of 182, Loss: 5.99 [/mw_shl_code]

13，最后，这是我们如何计算历时的训练损失。
[mw_shl_code=python,true]plt.plot(train_loss, 'k-')
plt.title('Sequence to Sequence Loss')
plt.xlabel('Generation')
plt.ylabel('Loss')
plt.show()[/mw_shl_code]

图4：模型各代中的序列间丢失。

这个怎么运作…
在此示例中，我们基于莎士比亚词汇表构建了具有LSTM单位的RNN模型以预测下一个单词。可以采取一些措施来改进模型，例如增加序列大小，降低学习速度或训练模型以获得更多时代。

还有更多…
为了采样，我们实现了一个贪婪的采样器，贪婪的采样器可能会一遍又一遍地重复相同的短语而陷入困境。为了防止这种情况，为避免这种情况，我们可能会陷入困境。为了避免这种情况，我们还可以采用一种更加随机的方式对单词进行采样，可能是通过根据输出的对数或概率分布进行加权采样来实现的。

原文：
Implementing an LSTM Model

We will extend our RNN model to be able to use longer sequences by introducing the LSTM unit in this recipe.

Getting ready
Long Short Term Memory(LSTM) is a variant of the traditional RNN.LSTM is a way to address the vanishing/exploding gradient problem that variable length RNNs have.To address this issue, LSTM cells introduce an internal forget gate,which can modify a flow of information from one cell to the next. To conceptualize how this works, we will walk through an unbiased version of LSTM one equation at a time.The first step is the same as for the regular RNN:

The idea with LSTM is to have a self-regulating flow of information through the cells that can be forgotten or modified based on the informationinput to the cell.

For this recipe, we will use a sequence RNN with LSTM cells to try to predict the next words, trained on the works of Shakespeare.To test how we are doing, we will feed the model candidate phrases, such as, thou art more, and see if the model can attempt to figure out what words should follow the phrase.

How to do it…
1.To start, we load the necessary libraries for the script:
import os
import re
import string
import requests
import numpy as np
import collections
import random
import pickle
import matplotlib.pyplot as plt
import tensorflow as tf

2.Next, we start a graph session and set the RNN parameterssess = tf.Session()
# Set RNN Parameters
min_word_freq = 5
rnn_size = 128
epochs = 10
batch_size = 100
learning_rate = 0.001
training_seq_len = 50
embedding_size = rnn_size
save_every = 500
eval_every = 50
prime_texts = ['thou art more', 'to be or not to', 'wherefore art thou']

3.We set up the data and model folders and filenames, along with declaring punctuation to remove.We will want to keep hyphens and apostrophes because Shakespeare uses them frequently to combine words and syllables:
data_dir = 'temp'
data_file = 'shakespeare.txt'
model_path = 'shakespeare_model'
full_model_dir = os.path.join(data_dir, model_path)
# Declare punctuation to remove, everything except hyphens and apostrophes
punctuation = string.punctuation
punctuation = ''.join([x for x in punctuation if x not in ['-', "'"]])

4.Next we get the data. If the data file doesn't exist, we download and save the Shakespeare text. If it does exist, we load the data:
if not os.path.exists(full_model_dir):
os.makedirs(full_model_dir)
# Make data directory
if not os.path.exists(data_dir):
os.makedirs(data_dir)
print('Loading Shakespeare Data')
# Check if file is downloaded.
if not os.path.isfile(os.path.join(data_dir, data_file)):
print('Not found, downloading Shakespeare texts from www. gutenberg.org')
shakespeare_url = 'http://www.gutenberg.org/cache/epub/100/ pg100.txt'
# Get Shakespeare text
response = requests.get(shakespeare_url)
shakespeare_file = response.content
# Decode binary into string
s_text = shakespeare_file.decode('utf-8')
# Drop first few descriptive paragraphs.
s_text = s_text[7675:]
# Remove newlines
s_text = s_text.replace('\r\n', '')
s_text = s_text.replace('\n', '')
# Write to file
with open(os.path.join(data_dir, data_file), 'w') as out_conn:
out_conn.write(s_text)
else:
# If file has been saved, load from that file
with open(os.path.join(data_dir, data_file), 'r') as file_ conn:
s_text = file_conn.read().replace('\n', '')

5.We clean the Shakespeare text by removing punctuation and extra whitespace:
s_text = re.sub(r'[{}]'.format(punctuation), ' ', s_text)
s_text = re.sub('\s+', ' ', s_text ).strip().lower()

6.We now deal with creating the Shakespeare vocabulary to use.We create a function that will return the two dictionaries (word to index, and index to word) with words that appear more than a specified frequency:
def build_vocab(text, min_word_freq):
word_counts = collections.Counter(text.split(' '))

# limit word counts to those more frequent than cutoff
word_counts = {key:val for key, val in word_counts.items() if val>min_word_freq}
# Create vocab --> index mapping
words = word_counts.keys()
vocab_to_ix_dict = {key:(ix+1) for ix, key in enumerate(words)}
# Add unknown key --> 0 index
vocab_to_ix_dict['unknown']=0
# Create index --> vocab mapping
ix_to_vocab_dict = {val:key for key,val in vocab_to_ix_dict. items()}
return(ix_to_vocab_dict, vocab_to_ix_dict)
ix2vocab, vocab2ix = build_vocab(s_text, min_word_freq)
vocab_size = len(ix2vocab) + 1

7.Now that we have our vocabulary, we turn the Shakespeare text into an array of indices:
s_text_words = s_text.split(' ')
s_text_ix = []
for ix, x in enumerate(s_text_words):
try:
s_text_ix.append(vocab2ix[x])
except:
s_text_ix.append(0)
s_text_ix = np.array(s_text_ix)

8.In this recipe, we will show how to create a model in a class object.This will be helpful for us, because we would like to use the same model (with the same weights) to train on batches and to generate text from sample text.This will prove hard to do without a class with an internal sampling method.Ideally, this class code should sit in a separate Python file,which we can import at the beginning of this script:
class LSTM_Model():
def __init__(self, rnn_size, batch_size, learning_rate,
training_seq_len, vocab_size, infer =False):
self.rnn_size = rnn_size
self.vocab_size = vocab_size
self.infer = infer

self.learning_rate = learning_rate
if infer:
self.batch_size = 1
self.training_seq_len = 1
else:
self.batch_size = batch_size
self.training_seq_len = training_seq_len
self.lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(rnn_size)
self.initial_state = self.lstm_cell.zero_state(self.batch_ size, tf.float32)
self.x_data = tf.placeholder(tf.int32, [self.batch_size, self.training_seq_len])
self.y_output = tf.placeholder(tf.int32, [self.batch_size, self.training_seq_len])
with tf.variable_scope('lstm_vars'):
# Softmax Output Weights
W = tf.get_variable('W', [self.rnn_size, self.vocab_ size], tf.float32, tf.random_normal_initializer())
b = tf.get_variable('b', [self.vocab_size], tf.float32, tf.constant_initializer(0.0))
# Define Embedding
embedding_mat = tf.get_variable('embedding_mat', [self.vocab_size, self.rnn_size], tf.float32, tf.random_normal_ initializer())
embedding_output = tf.nn.embedding_lookup(embedding_ mat, self.x_data)
rnn_inputs = tf.split(1, self.training_seq_len, embedding_output)
rnn_inputs_trimmed = [tf.squeeze(x, [1]) for x in rnn_ inputs]
# If we are inferring (generating text), we add a 'loop' function
# Define how to get the i+1 th input from the i th output
def inferred_loop(prev, count):
prev_transformed = tf.matmul(prev, W) + b
prev_symbol = tf.stop_gradient(tf.argmax(prev_ transformed, 1))

output = tf.nn.embedding_lookup(embedding_mat, prev_ symbol)
return(output)
decoder = tf.nn.seq2seq.rnn_decoder
outputs, last_state = decoder(rnn_inputs_trimmed,
self.initial_state,
self.lstm_cell,
loop_function=inferred_loop if infer else None)
# Non inferred outputs
output = tf.reshape(tf.concat(1, outputs), [-1, self.rnn_ size])
# Logits and output
self.logit_output = tf.matmul(output, W) + b
self.model_output = tf.nn.softmax(self.logit_output)
loss_fun = tf.nn.seq2seq.sequence_loss_by_example
loss = loss_fun([self.logit_output],[tf.reshape(self.y_ output, [-1])],
[tf.ones([self.batch_size * self.training_seq_ len])],
self.vocab_size)
self.cost = tf.reduce_sum(loss) / (self.batch_size * self. training_seq_len)
self.final_state = last_state
gradients, _ = tf.clip_by_global_norm(tf.gradients(self. cost, tf.trainable_variables()), 4.5)
optimizer = tf.train.AdamOptimizer(self.learning_rate)
self.train_op = optimizer.apply_gradients(zip(gradients, tf.trainable_variables()))
def sample(self, sess, words=ix2vocab, vocab=vocab2ix, num=10, prime_text='thou art'):
state = sess.run(self.lstm_cell.zero_state(1, tf.float32))
word_list = prime_text.split()
for word in word_list[:-1]:
x = np.zeros((1, 1))
x[0, 0] = vocab[word]
feed_dict = {self.x_data: x, self.initial_state:state}
[state] = sess.run([self.final_state], feed_dict=feed_ dict)
out_sentence = prime_text
word = word_list[-1]

for n in range(num):
x = np.zeros((1, 1))
x[0, 0] = vocab[word]
feed_dict = {self.x_data: x, self.initial_state:state}
[model_output, state] = sess.run([self.model_output, self.final_state], feed_dict=feed_dict)
sample = np.argmax(model_output[0])
if sample == 0:
break
word = words[sample]
out_sentence = out_sentence + ' ' + word
return(out_sentence)

9.Now we will declare the LSTM model as well as the test model. We will do this within a variable scope and tell the scope that we will reuse the variables for the test LSTM model:
with tf.variable_scope('lstm_model') as scope:
# Define LSTM Model
lstm_model = LSTM_Model(rnn_size, batch_size, learning_rate,
training_seq_len, vocab_size)
scope.reuse_variables()
test_lstm_model = LSTM_Model(rnn_size, batch_size, learning_ rate,
training_seq_len, vocab_size, infer=True)

10.We create a saving operation, as well as splitting up the input text into equal batch-size chunks.Then we will initialize the variables of the model:
saver = tf.train.Saver()
# Create batches for each epoch
num_batches = int(len(s_text_ix)/(batch_size * training_seq_len)) + 1
# Split up text indices into subarrays, of equal size
batches = np.array_split(s_text_ix, num_batches)
# Reshape each split into [batch_size, training_seq_len]
batches = [np.resize(x, [batch_size, training_seq_len]) for x in batches]
# Initialize all variables
init = tf.initialize_all_variables()
sess.run(init)

11.We can now iterate through our epochs, shuffling the data before each epoch starts.The target for our data is just the same data, but shifted by one value (using thenumpy.roll() function):
train_loss = []
iteration_count = 1

for epoch in range(epochs):
# Shuffle word indices
random.shuffle(batches)
# Create targets from shuffled batches
targets = [np.roll(x, -1, axis=1) for x in batches]
# Run a through one epoch
print('Starting Epoch #{} of {}.'.format(epoch+1, epochs))
# Reset initial LSTM state every epoch
state = sess.run(lstm_model.initial_state)
for ix, batch in enumerate(batches):
training_dict = {lstm_model.x_data: batch, lstm_model.y_ output: targets[ix]}
c, h = lstm_model.initial_state
training_dict[c] = state.c
training_dict[h] = state.h
temp_loss, state, _ = sess.run([lstm_model.cost, lstm_ model.final_state, lstm_model.train_op], feed_dict=training_dict)
train_loss.append(temp_loss)
# Print status every 10 gens
if iteration_count % 10 == 0:
summary_nums = (iteration_count, epoch+1, ix+1, num_ batches+1, temp_loss)
print('Iteration: {}, Epoch: {}, Batch: {} out of {}, Loss: {:.2f}'.format(*summary_nums))
# Save the model and the vocab
if iteration_count % save_every == 0:
# Save model
model_file_name = os.path.join(full_model_dir, 'model')
saver.save(sess, model_file_name, global_step = iteration_count)
print('Model Saved To: {}'.format(model_file_name))
# Save vocabulary
dictionary_file = os.path.join(full_model_dir, 'vocab. pkl')
with open(dictionary_file, 'wb') as dict_file_conn:
pickle.dump([vocab2ix, ix2vocab], dict_file_conn)
if iteration_count % eval_every == 0:

for sample in prime_texts:
print(test_lstm_model.sample(sess, ix2vocab, vocab2ix, num=10, prime_text=sample))
iteration_count += 1

12.This results in the following output:
Loading Shakespeare Data
Cleaning Text
Building Shakespeare Vocab
Vocabulary Length = 8009
Starting Epoch #1 of 10.
Iteration: 10, Epoch: 1, Batch: 10 out of 182, Loss: 10.37
Iteration: 20, Epoch: 1, Batch: 20 out of 182, Loss: 9.54
...
Iteration: 1790, Epoch: 10, Batch: 161 out of 182, Loss: 5.68
Iteration: 1800, Epoch: 10, Batch: 171 out of 182, Loss: 6.05
thou art more than i am a
to be or not to the man i have
wherefore art thou art of the long
Iteration: 1810, Epoch: 10, Batch: 181 out of 182, Loss: 5.99

13.And finally, here is how we plot the training loss over the epochs.
plt.plot(train_loss, 'k-')
plt.title('Sequence to Sequence Loss')
plt.xlabel('Generation')
plt.ylabel('Loss')
plt.show()

Figure 4: The sequence-to-sequence loss over all generations of the model.

How it works…
In this example, we built an RNN model with LSTM units to predict the next word, based on Shakespearean vocabulary.There are a few things that could be done to improve the model, maybe increasing the sequence size, having a decaying learning rate, or training the model for more epochs.

There's more…
For sampling, we implemented a greedy sampler.Greedy samplers can get stuck repeating the same phrases over and over. For example, it may get stuck saying for the for the for the….To prevent this, we could also implement a more random way of sampling words, maybe by doing a weighted sampler based on the logits or probability distribution of the output.

最新经典文章，欢迎关注公众号