深度神经网络

1. 深度神经网络简介

1. 线性模型复杂度

线性模型中,可调参数个数=w维数展开+b维数展开,即当我们有N个输入,K个输出时,我们的可调参数个数为(N+1)K

  • 线性模型比较稳定,因为数据是线性叠加的,微小的输入不会引起结果的剧烈变化
  • 因为模型是线性的,所以能够表示的关系是有限的,只能表示线性关系(输入是相加而非相乘)
  • 线性函数的导数是常量
  • 我们希望模型是非线性的,但是参数存放在线性的方程中,因此我们必须添加非线性成分
  • 我们需要大量的可调参数,而不是固定的(N+1)K个

2. Rectified Linear Units(ReLUs)

为了解决上一节提出的问题,我们引入ReLU函数,将其插入到矩阵中。以往,我们在构造多层的神经网络时,不同层的W是相乘的,然后得到一个W作为整体参与到结果的运算中,现在我们需要在Wi相乘的过程中插入ReLUs,这样模型就变成非线性模型来,同时我们可以调节隐藏层ReLUs的数量以达到增加参数的目的。

注意

ReLUs也是一种激活函数,目前我们已经接触到的激活函数有:sigmoid,softmax,ReLUs

在tensorflow中,我们使用tf.nn.relu()来调用relu函数

3. Multilayer Neural Networks

在网络中添加隐藏层可以让模型变得更复杂,同时,在隐藏层中添加非线性的激活函数,可以让模型变成非线性的。

假定我们构造一个2层的神经网络:

  • 第一层保护来一组权重和偏差,我们将X输入到这一层,并传入到激活函数ReLUs中,输出的结果会输入到下一层(隐藏层)
  • 隐藏层将结果和本层的权重、偏差进行计算,得到输出层结果y,然后使用softmax函数将其转换为概率
# Hidden Layer with ReLU activation function
hidden_layer = tf.add(tf.matmul(features, hidden_weights), hidden_biases)
hidden_layer = tf.nn.relu(hidden_layer)

output = tf.add(tf.matmul(hidden_layer, output_weights), output_biases)

4. 链式法则及反向传播

这里我们同样会利用导数的链式法则,通过求解各部分导数,然后将其相乘,得到总体的导数。可以参考链式法则

2. 基于tensorflow的深度神经网络

1. 例程

学习参数

import tensorflow as tf

# Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 128  # Decrease batch size if you don't have enough memory
display_step = 1

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

隐藏层参数

n_hidden_layer = 256 # layer number of features

权重和偏差

这里我们为不同的层创建不同的权重和偏差

# Store layers weight & bias
weights = {
    'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]))
}
biases = {
    'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

输入

# tf Graph input
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

x_flat = tf.reshape(x, [-1, n_input])

多层感知元

# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']),\
    biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
logits = tf.add(tf.matmul(layer_1, weights['out']), biases['out'])

优化器

# Define loss and optimizer
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

会话

# Initializing the variables
init = tf.global_variables_initializer()


# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})

2. 训练神经网络

我们可以有两种方式来扩展我们的神经网络

  • 更广:增加隐藏层H的数量,但是参数过多会难以训练
  • 更深:增加多层神经网络

更深的方向是比较好的思路,一方面参数比较少,另一方面它会呈现出明显的结构特征,每一层可以学到不同的信息。学习速率也更快。

学习完成之后,我们自然希望将结果存储下来,此时可以使用:tf.train.Saver.

3. 储存变量和模型

储存变量

使用tf.train.Saver.save() 函数储存数据到**.ckpt格式文件中**. (checkpoint)

import tensorflow as tf

# The file path to save the data
save_file = './model.ckpt'

# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

with tf.Session() as sess:
    # Initialize all the Variables
    sess.run(tf.global_variables_initializer())

    # Show the values of weights and bias
    print('Weights:')
    print(sess.run(weights))
    print('Bias:')
    print(sess.run(bias))

    # Save the model
    saver.save(sess, save_file)

加载数据

因为tf.train.Saver.restore()函数在载入时会设置数据,不必要再调用tf.global_variables_initializer().

# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

with tf.Session() as sess:
    # Load the weights and bias
    saver.restore(sess, save_file)

    # Show the values of weights and bias
    print('Weight:')
    print(sess.run(weights))
    print('Bias:')
    print(sess.run(bias))

储存完整模型

import math

save_file = './train_model.ckpt'
batch_size = 128
n_epochs = 100

saver = tf.train.Saver()

# Launch the graph
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    # Training cycle
    for epoch in range(n_epochs):
        total_batch = math.ceil(mnist.train.num_examples / batch_size)

        # Loop over all batches
        for i in range(total_batch):
            batch_features, batch_labels = mnist.train.next_batch(batch_size)
            sess.run(
                optimizer,
                feed_dict={features: batch_features, labels: batch_labels})

        # Print status for every 10 epochs
        if epoch % 10 == 0:
            valid_accuracy = sess.run(
                accuracy,
                feed_dict={
                    features: mnist.validation.images,
                    labels: mnist.validation.labels})
            print('Epoch {:<3} - Validation Accuracy: {}'.format(
                epoch,
                valid_accuracy))

    # Save the model
    saver.save(sess, save_file)
    print('Trained Model Saved.')

加载模型

saver = tf.train.Saver()

# Launch the graph
with tf.Session() as sess:
    saver.restore(sess, save_file)

    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: mnist.test.images, labels: mnist.test.labels})

print('Test Accuracy: {}'.format(test_accuracy))

4. 加载参数到新的模型

TensorFlow 使用name参数来标记张量和运算,如果没有设置name,则tensorflow会自动设置为<Type>_<number>,根据变量声明的顺序和类型来命名。因此,如果将一个模型的参数导入另一个,可能会因为顺序等原因,造成错误的自动赋值,因此我们需要手工的指定。

import tensorflow as tf

tf.reset_default_graph()

save_file = 'model.ckpt'

# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]), name='weights_0')
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')
weights = tf.Variable(tf.truncated_normal([2, 3]) ,name='weights_0')

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))

with tf.Session() as sess:
    # Load the weights and bias - No Error
    saver.restore(sess, save_file)

print('Loaded Weights and Bias successfully.')

4. 正规化

「紧身裤」问题:紧身裤非常合身,但是难以穿上,因此我们会穿稍大一点的裤子。如果数据非常符合模型,将难以优化,因此我们会选择一个更泛化的模型,来防止出现过拟合

防止过拟合的方法:

  • 过早终止
  • 正则化:对神经网络进行人为的约束,使得隐式的减少参数个数。
    1. L2
    2. dropout

1. L2 正则化

L2 正则化非常简单,我们只需要在loss函数上加一部分,即所有向量的平方和/2,而不需要修改模型的结构,而且它的导数也非常的简单。

2. Dropout正则化

将训练的样本随机取一半设置为0,使的网络不依赖任何给定的激活存在,因为任何激活都可能被摧毁

因此网络需要储存不同表示方法的冗余的数据 这种方法看上去有些啰嗦,实际上却可以增强网络的可靠性,并且能够防止过拟合

The tf.nn.dropout() function takes in two parameters:

  1. hidden_layer: the tensor to which you would like to apply dropout
  2. keep_prob: the probability of keeping (i.e. not dropping) any given unit

keep_prob allows you to adjust the number of units to drop. In order to compensate for dropped units, tf.nn.dropout() multiplies all units that are kept (i.e. not dropped) by 1/keep_prob.

During training, a good starting value for keep_prob is 0.5.

During testing, use a keep_prob value of 1.0 to keep all units and maximize the power of the model

扩展阅读