tensorflow2.0踩坑记录

tensorflow2.0设计理念

ref: https://blog.tensorflow.org/2019/01/what-are-symbolic-and-imperative-apis.html

Symbolic (or Declarative) APIs

Sequential APIs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test, verbose=2)

Functional APIs

ref:https://www.tensorflow.org/guide/keras/functional#all_models_are_callable_just_like_layers

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
encoder_input = keras.Input(shape=(28, 28, 1), name='original_img')
x = layers.Conv2D(16, 3, activation='relu')(encoder_input)
x = layers.Conv2D(32, 3, activation='relu')(x)
x = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(32, 3, activation='relu')(x)
x = layers.Conv2D(16, 3, activation='relu')(x)
encoder_output = layers.GlobalMaxPooling2D()(x)

encoder = keras.Model(encoder_input, encoder_output, name='encoder')
encoder.summary()

decoder_input = keras.Input(shape=(16,), name='encoded_img')
x = layers.Reshape((4, 4, 1))(decoder_input)
x = layers.Conv2DTranspose(16, 3, activation='relu')(x)
x = layers.Conv2DTranspose(32, 3, activation='relu')(x)
x = layers.UpSampling2D(3)(x)
x = layers.Conv2DTranspose(16, 3, activation='relu')(x)
decoder_output = layers.Conv2DTranspose(1, 3, activation='relu')(x)

decoder = keras.Model(decoder_input, decoder_output, name='decoder')
decoder.summary()

autoencoder_input = keras.Input(shape=(28, 28, 1), name='img')
encoded_img = encoder(autoencoder_input)
decoded_img = decoder(encoded_img)
autoencoder = keras.Model(autoencoder_input, decoded_img, name='autoencoder')
autoencoder.summary()

Imperative (or Model Subclassing) APIs

详细见下面的sublayer,subclass

自定义Layer

一般步骤

  • 如果需要使用到其他Layer结构或者Sequential结构,需要在init()函数里赋值
  • 在build()里面构建权重参数, 每个参数需要赋值name
    • 如果参数不给name,当训练到第2个epoch时会报错:AttributeError: ‘NoneType’ object has no attribute ‘replace’
  • 在call()里写计算逻辑

eg:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from tensorflow.keras import layers
class Linear(layers.Layer):

def __init__(self, units=32):
super(Linear, self).__init__()
self.units = units

def build(self, input_shape):
self.w = self.add_weight(shape=(input_shape[-1], self.units),
initializer='random_normal',
trainable=True)
self.b = self.add_weight(shape=(self.units,),
initializer='random_normal',
trainable=True)

def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b

Layer层中添加loss

Layers recursively collect losses created during the forward pass

如果Layer层有loss,通过调用self.add_loss()来添加

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# A layer that creates an activity regularization loss
class ActivityRegularizationLayer(layers.Layer):

def __init__(self, rate=1e-2):
super(ActivityRegularizationLayer, self).__init__()
self.rate = rate

def call(self, inputs):
self.add_loss(self.rate * tf.reduce_sum(inputs))
return inputs

class OuterLayer(layers.Layer):

def __init__(self):
super(OuterLayer, self).__init__()
self.activity_reg = ActivityRegularizationLayer(1e-2)

def call(self, inputs):
return self.activity_reg(inputs)


layer = OuterLayer()
assert len(layer.losses) == 0 # No losses yet since the layer has never been called
_ = layer(tf.zeros(1, 1))
assert len(layer.losses) == 1 # We created one loss value

# `layer.losses` gets reset at the start of each __call__
_ = layer(tf.zeros(1, 1))
assert len(layer.losses) == 1 # This is the loss created during the call above

如果Layer层有正则,则会自动添加正则loss

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class OuterLayer(layers.Layer):

def __init__(self):
super(OuterLayer, self).__init__()
self.dense = layers.Dense(32, kernel_regularizer=tf.keras.regularizers.l2(1e-3))

def call(self, inputs):
return self.dense(inputs)


layer = OuterLayer()
_ = layer(tf.zeros((1, 1)))

# This is `1e-3 * sum(layer.dense.kernel ** 2)`,
# created by the `kernel_regularizer` above.
print(layer.losses)

>> [<tf.Tensor: shape=(), dtype=float32, numpy=0.0018715981>]

在编写训练循环时应该考虑这些损失,如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Instantiate an optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Iterate over the batches of a dataset.
for x_batch_train, y_batch_train in train_dataset:
with tf.GradientTape() as tape:
logits = layer(x_batch_train) # Logits for this minibatch
# Loss value for this minibatch
loss_value = loss_fn(y_batch_train, logits)
# Add extra losses created during this forward pass:
loss_value += sum(model.losses)

grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))

Layer中选择是否training

1
2
3
4
5
6
7
8
9
10
class CustomDropout(layers.Layer):

def __init__(self, rate, **kwargs):
super(CustomDropout, self).__init__(**kwargs)
self.rate = rate

def call(self, inputs, training=None):
if training:
return tf.nn.dropout(inputs, rate=self.rate)
return inputs

自定义Model

一般步骤

  • 如果需要使用到其他Layer结构或者Sequential结构,需要在init()函数里赋值
  • 在model没有fit前,想调用summary函数时显示模型各层shape时,则需要自定义一个函数去build下模型,类似下面代码中的build_graph函数
  • summary()显示shape顺序,是按照init()里layer赋值的顺序
    eg:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# -*- coding: utf-8 -*-
# @Time : 2020/4/21 13:50
# @Author : zdqzyx
# @File : text_birnn_att.py
# @Software: PyCharm


import tensorflow as tf
from tensorflow.keras.layers import Embedding, Dense, GRU, Bidirectional
from tensorflow.keras import Model
from attention import Attention

def point_wise_feed_forward_network(dense_size):
ffn = tf.keras.Sequential()
for size in dense_size:
ffn.add(Dense(size, activation='relu'))
return ffn

class TextBiRNNAtt(Model):

def __init__(self,
maxlen,
max_features,
embedding_dims,
class_num,
last_activation='softmax',
dense_size=None
):
'''
:param maxlen: 文本最大长度
:param max_features: 词典大小
:param embedding_dims: embedding维度大小
:param class_num:
:param last_activation:
'''
super(TextBiRNNAtt, self).__init__()
self.maxlen = maxlen
self.max_features = max_features
self.embedding_dims = embedding_dims
self.class_num = class_num
self.last_activation = last_activation
self.dense_size = dense_size

self.embedding = Embedding(input_dim=self.max_features, output_dim=self.embedding_dims, input_length=self.maxlen)
self.bi_rnn = Bidirectional(layer=GRU(units=128, activation='tanh', return_sequences=True), merge_mode='concat' ) # LSTM or GRU
self.attention = Attention()
if self.dense_size is not None:
self.ffn = point_wise_feed_forward_network(dense_size)
self.classifier = Dense(self.class_num, activation=self.last_activation)

def call(self, inputs, training=None, mask=None):
if len(inputs.get_shape()) != 2:
raise ValueError('The rank of inputs of TextBiRNNAtt must be 2, but now is {}'.format(inputs.get_shape()))
if inputs.get_shape()[1] != self.maxlen:
raise ValueError('The maxlen of inputs of TextBiRNNAtt must be %d, but now is %d' % (self.maxlen, inputs.get_shape()[1]))

emb = self.embedding(inputs)
x = self.bi_rnn(emb)
x = self.attention(x)
if self.dense_size is not None:
x = self.ffn(x)
output = self.classifier(x)
return output

def build_graph(self, input_shape):
input_shape_nobatch = input_shape[1:]
self.build(input_shape)
inputs = tf.keras.Input(shape=input_shape_nobatch)
if not hasattr(self, 'call'):
raise AttributeError("User should define 'call' method in sub-class model!")
_ = self.call(inputs)

if __name__=='__main__':
model = TextBiRNNAtt(maxlen=400,
max_features=5000,
embedding_dims=100,
class_num=2,
last_activation='softmax',
# dense_size=[128, 64],
dense_size = None
)
model.build_graph(input_shape=(None, 400))
model.summary()

关于training

无论时自定义layer还是自定义model里都有一个方法:

1
call(self, inputs, training=None, mask=None)

这个方法里的training在没有显式指定时,使用keras的fit方法时会变成True, predict方法会变成False。但是一旦在自定义layer或者model里显式指定了,如果还有下一层连接,那么下一层会自动指定为上一层显式指定的training。

关于Mask

自定义层里支持mask

self.supports_masking = True,这个还有很多不清楚的地方,慎用, 建议使用手动传递mask

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class Attention(Layer):
def __init__(self, ):
super(Attention, self).__init__()
# 如果来自上一层有Mask,一般是Embedding层(mask_zero=True),这一层要访问需要人工指定self.supports_masking = True
self.supports_masking = True

def build(self, input_shape):
'''
:param input_shape:
:return:
'''
# 这里定义一些参数
self.built = True

def compute_mask(self, inputs, mask=None):
''' 这个很重要,当self.supports_masking = True, 如果后续layer不需要mask,这里需要重写compute_mask,并且返回None
'''
return None

def call(self, inputs, mask=None):
# 这里的mask可以手动传递一个,如果不传递,且self.supports_masking = True,则是来自上一层的mask
return c

关于checkpoints

报错:WARNING: Logging before flag parsing goes to stderr.
W1008 09:57:52.766877 4594230720 util.py:244] Unresolved object in checkpoint: (root).optimizer.iter
解决方法链接:https://code5.cn/so/python-3.x/2479816