tensorflow2.0踩坑记录

tensorflow2.0设计理念

ref: https://blog.tensorflow.org/2019/01/what-are-symbolic-and-imperative-apis.html

Symbolic (or Declarative) APIs

Sequential APIs

import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test,  y_test, verbose=2)

Functional APIs

ref:https://www.tensorflow.org/guide/keras/functional#all_models_are_callable_just_like_layers

encoder_input = keras.Input(shape=(28, 28, 1), name='original_img')
x = layers.Conv2D(16, 3, activation='relu')(encoder_input)
x = layers.Conv2D(32, 3, activation='relu')(x)
x = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(32, 3, activation='relu')(x)
x = layers.Conv2D(16, 3, activation='relu')(x)
encoder_output = layers.GlobalMaxPooling2D()(x)

encoder = keras.Model(encoder_input, encoder_output, name='encoder')
encoder.summary()

decoder_input = keras.Input(shape=(16,), name='encoded_img')
x = layers.Reshape((4, 4, 1))(decoder_input)
x = layers.Conv2DTranspose(16, 3, activation='relu')(x)
x = layers.Conv2DTranspose(32, 3, activation='relu')(x)
x = layers.UpSampling2D(3)(x)
x = layers.Conv2DTranspose(16, 3, activation='relu')(x)
decoder_output = layers.Conv2DTranspose(1, 3, activation='relu')(x)

decoder = keras.Model(decoder_input, decoder_output, name='decoder')
decoder.summary()

autoencoder_input = keras.Input(shape=(28, 28, 1), name='img')
encoded_img = encoder(autoencoder_input)
decoded_img = decoder(encoded_img)
autoencoder = keras.Model(autoencoder_input, decoded_img, name='autoencoder')
autoencoder.summary()

Imperative (or Model Subclassing) APIs

详细见下面的sublayer,subclass

自定义Layer

一般步骤

如果需要使用到其他Layer结构或者Sequential结构，需要在init()函数里赋值

在build()里面构建权重参数，每个参数需要赋值name

如果参数不给name，当训练到第2个epoch时会报错：AttributeError: ‘NoneType’ object has no attribute ‘replace’

在call()里写计算逻辑

eg:

from tensorflow.keras import layers
class Linear(layers.Layer):

  def __init__(self, units=32):
    super(Linear, self).__init__()
    self.units = units

  def build(self, input_shape):
    self.w = self.add_weight(shape=(input_shape[-1], self.units),
                             initializer='random_normal',
                             trainable=True)
    self.b = self.add_weight(shape=(self.units,),
                             initializer='random_normal',
                             trainable=True)

  def call(self, inputs):
    return tf.matmul(inputs, self.w) + self.b

Layer层中添加loss

Layers recursively collect losses created during the forward pass

如果Layer层有loss，通过调用self.add_loss()来添加

# A layer that creates an activity regularization loss
class ActivityRegularizationLayer(layers.Layer):

  def __init__(self, rate=1e-2):
    super(ActivityRegularizationLayer, self).__init__()
    self.rate = rate

  def call(self, inputs):
    self.add_loss(self.rate * tf.reduce_sum(inputs))
    return inputs

class OuterLayer(layers.Layer):

  def __init__(self):
    super(OuterLayer, self).__init__()
    self.activity_reg = ActivityRegularizationLayer(1e-2)

  def call(self, inputs):
    return self.activity_reg(inputs)


layer = OuterLayer()
assert len(layer.losses) == 0  # No losses yet since the layer has never been called
_ = layer(tf.zeros(1, 1))
assert len(layer.losses) == 1  # We created one loss value

# `layer.losses` gets reset at the start of each __call__
_ = layer(tf.zeros(1, 1))
assert len(layer.losses) == 1  # This is the loss created during the call above

如果Layer层有正则，则会自动添加正则loss

class OuterLayer(layers.Layer):

  def __init__(self):
    super(OuterLayer, self).__init__()
    self.dense = layers.Dense(32, kernel_regularizer=tf.keras.regularizers.l2(1e-3))

  def call(self, inputs):
    return self.dense(inputs)


layer = OuterLayer()
_ = layer(tf.zeros((1, 1)))

# This is `1e-3 * sum(layer.dense.kernel ** 2)`,
# created by the `kernel_regularizer` above.
print(layer.losses)

>> [<tf.Tensor: shape=(), dtype=float32, numpy=0.0018715981>]

在编写训练循环时应该考虑这些损失，如下所示:

# Instantiate an optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Iterate over the batches of a dataset.
for x_batch_train, y_batch_train in train_dataset:
  with tf.GradientTape() as tape:
    logits = layer(x_batch_train)  # Logits for this minibatch
    # Loss value for this minibatch
    loss_value = loss_fn(y_batch_train, logits)
    # Add extra losses created during this forward pass:
    loss_value += sum(model.losses)

  grads = tape.gradient(loss_value, model.trainable_weights)
  optimizer.apply_gradients(zip(grads, model.trainable_weights))

Layer中选择是否training

class CustomDropout(layers.Layer):

  def __init__(self, rate, **kwargs):
    super(CustomDropout, self).__init__(**kwargs)
    self.rate = rate

  def call(self, inputs, training=None):
    if training:
        return tf.nn.dropout(inputs, rate=self.rate)
    return inputs

自定义Model

一般步骤

如果需要使用到其他Layer结构或者Sequential结构，需要在init()函数里赋值

在model没有fit前，想调用summary函数时显示模型各层shape时，则需要自定义一个函数去build下模型，类似下面代码中的build_graph函数

summary()显示shape顺序，是按照init()里layer赋值的顺序
eg:

# -*- coding: utf-8 -*-
# @Time : 2020/4/21 13:50
# @Author : zdqzyx
# @File : text_birnn_att.py
# @Software: PyCharm


import  tensorflow as tf
from tensorflow.keras.layers import Embedding, Dense, GRU, Bidirectional
from tensorflow.keras import Model
from attention import Attention

def point_wise_feed_forward_network(dense_size):
    ffn = tf.keras.Sequential()
    for size in dense_size:
        ffn.add(Dense(size, activation='relu'))
    return ffn

class TextBiRNNAtt(Model):

    def __init__(self,
                 maxlen,
                 max_features,
                 embedding_dims,
                 class_num,
                 last_activation='softmax',
                 dense_size=None
                 ):
        '''
        :param maxlen: 文本最大长度
        :param max_features: 词典大小
        :param embedding_dims: embedding维度大小
        :param class_num:
        :param last_activation:
        '''
        super(TextBiRNNAtt, self).__init__()
        self.maxlen = maxlen
        self.max_features = max_features
        self.embedding_dims = embedding_dims
        self.class_num = class_num
        self.last_activation = last_activation
        self.dense_size = dense_size

        self.embedding = Embedding(input_dim=self.max_features, output_dim=self.embedding_dims, input_length=self.maxlen)
        self.bi_rnn = Bidirectional(layer=GRU(units=128, activation='tanh', return_sequences=True), merge_mode='concat' ) # LSTM or GRU
        self.attention = Attention()
        if self.dense_size is not None:
            self.ffn = point_wise_feed_forward_network(dense_size)
        self.classifier = Dense(self.class_num, activation=self.last_activation)

    def call(self, inputs, training=None, mask=None):
        if len(inputs.get_shape()) != 2:
            raise ValueError('The rank of inputs of TextBiRNNAtt must be 2, but now is {}'.format(inputs.get_shape()))
        if inputs.get_shape()[1] != self.maxlen:
            raise ValueError('The maxlen of inputs of TextBiRNNAtt must be %d, but now is %d' % (self.maxlen, inputs.get_shape()[1]))

        emb = self.embedding(inputs)
        x = self.bi_rnn(emb)
        x = self.attention(x)
        if self.dense_size is not None:
            x = self.ffn(x)
        output = self.classifier(x)
        return output

    def build_graph(self, input_shape):
        input_shape_nobatch = input_shape[1:]
        self.build(input_shape)
        inputs = tf.keras.Input(shape=input_shape_nobatch)
        if not hasattr(self, 'call'):
            raise AttributeError("User should define 'call' method in sub-class model!")
        _ = self.call(inputs)

if __name__=='__main__':
    model = TextBiRNNAtt(maxlen=400,
                        max_features=5000,
                        embedding_dims=100,
                        class_num=2,
                        last_activation='softmax',
                        # dense_size=[128, 64],
                        dense_size = None
                        )
    model.build_graph(input_shape=(None, 400))
    model.summary()

关于training

无论时自定义layer还是自定义model里都有一个方法：

1	call(self, inputs, training=None, mask=None)

这个方法里的training在没有显式指定时，使用keras的fit方法时会变成True, predict方法会变成False。但是一旦在自定义layer或者model里显式指定了，如果还有下一层连接，那么下一层会自动指定为上一层显式指定的training。

关于Mask

自定义层里支持mask

self.supports_masking = True，这个还有很多不清楚的地方，慎用, 建议使用手动传递mask

class Attention(Layer):
    def __init__(self,     ):
        super(Attention, self).__init__()
        # 如果来自上一层有Mask，一般是Embedding层（mask_zero=True），这一层要访问需要人工指定self.supports_masking = True
        self.supports_masking = True

    def build(self, input_shape):
        '''
        :param input_shape:
        :return:
        '''
        # 这里定义一些参数
        self.built = True

    def compute_mask(self, inputs, mask=None):
        ''' 这个很重要，当self.supports_masking = True， 如果后续layer不需要mask，这里需要重写compute_mask，并且返回None
        '''
        return None

    def call(self, inputs, mask=None):
        # 这里的mask可以手动传递一个，如果不传递，且self.supports_masking = True，则是来自上一层的mask
        return c

关于checkpoints

报错：WARNING: Logging before flag parsing goes to stderr.
W1008 09:57:52.766877 4594230720 util.py:244] Unresolved object in checkpoint: (root).optimizer.iter
解决方法链接：https://code5.cn/so/python-3.x/2479816