DeepFM详解和实现

FM(Factorization Machines)

传统的LR算法是线性模型,想要提取非线性关系,要么通过GBDT来提取非线性特征,要么手动构建非线性特征。

FM直接显示的构建交叉特征,直接建模二阶关系:

公式如下:

其中$w_{ij}$是二阶关系的参数,其个数是$\frac{n(n-1)}{2}$,复杂度是$O(n^2)$

优化时间复杂度,矩阵分解提供了一种解决思路,$w_{ij} = \langle \mathbf{v}_i, \mathbf{v}_j \rangle$ 来代替上式。则:

其中,$v_i$是第 i 维特征的隐向量,⟨⋅,⋅⟩ 代表向量点积。隐向量的长度为 k(k«n),包含 k 个描述特征的因子。
直观上看,FM的复杂度是$O(kn^2)$,但通过数学化简,可以做到$O(kn)$,具体推导如下:

DeepFM

Paper [IJCAI 2017]:
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction

网络结构图

输入:input_idxs(n, f), input_values(n, f)
fm部分:根据特征id(input_idxs)进行embedding ==> (n, f, k),k为embedding向量维度,乘以input_values,在进行二阶特征交叉。 input_values连接一个Weight得到一阶特征表达。
deep部分:根据特征id进行embedding ==> (n, f, k),乘以input_values,在reshape成(n, f*k),在连接Dense

整体网络结构

FM部分:

Deep部分:

代码实现(第一种)

数据处理

模型需要输入两个,input_idxs, input_values

  • input_idxs是稀疏编码,即每一个分类型的field下各个独一无二的取值,每个连续型field都编码为一个定值。
  • input_values是特征取值,分类型field特征取值变为1,连续性field特征取值不变

举个例子:

需要注意的

  • second_order_part里面,分类型field和连续型field进行了交叉
  • deep_part里面,embedding之后和特征值相乘,再接Dense

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
import tensorflow as tf

def dnn(params):
dnn_model = tf.keras.Sequential()
for size in params['dnn_hidden_units']:
dnn_model.add(tf.keras.layers.Dense(size, activation='relu', use_bias=False))
dnn_model.add(tf.keras.layers.Dense(1, activation=None, use_bias=False))
return dnn_model

class DeepFM(tf.keras.Model):

def __init__(self, params):
'''
:param params:
feature_size: 编码id大小
factor_size:embedding维度大小,对应公式里的k
field_size: 输入变量个数,对应公式里的f
'''
super(DeepFM, self).__init__()
self.params = params
self.embeddings_1 = tf.keras.layers.Embedding(params['feature_size'], 1)
self.embeddings_2 = tf.keras.layers.Embedding(params['feature_size'], params['factor_size'],
embeddings_regularizer=tf.keras.regularizers.l2(0.00001),
embeddings_initializer=tf.initializers.RandomNormal(
mean=0.0, stddev=0.0001, seed=1024)
)
self.deep_dnn = dnn(params)
self.dense_output = tf.keras.layers.Dense(params['class_num'], activation=params['last_activation'], )

def first_order_part(self, idxs, values):
'''
:return: (n, k)
'''
x = self.embeddings_1(idxs) # (n, f, 1)
x = tf.multiply(x, tf.expand_dims(values, axis=-1)) # (n, f, 1)
x = tf.reduce_sum(x, axis=1) # (n, 1)
return x

def second_order_part(self, idxs, values):
'''2ab = (a+b)^2- (a^2+b^2)
:return (n, k)
'''
x = self.embeddings_2(idxs) # (n, f, k)
x = tf.multiply(x, tf.expand_dims(values, axis=-1)) # (n, f, k)
sum_square = tf.square(tf.reduce_sum(x, axis=1)) # (n, k)
square_sum = tf.reduce_sum(tf.square(x), axis=1) # (n, k)
output = 0.5*(tf.subtract(sum_square, square_sum))
return tf.reduce_sum(output, axis=1, keepdims=True)
return output

def deep_part(self, idxs, values):
'''
:return: (n, 128)
'''
x = self.embeddings_2(idxs)
x = tf.multiply(x, tf.expand_dims(values, axis=-1)) # (n, f, k)
x = tf.reshape(x, (-1, self.params['field_size']*self.params['factor_size']))
x =self.deep_dnn(x)
return x


def call(self, idxs, values):
'''
:param idxs: (n, f)
:param values: (n, f)
:return:
'''
first_order_output = self.first_order_part(idxs, values)
second_order_output = self.second_order_part(idxs, values)
deep_output = self.deep_part(idxs, values)
combined_output = tf.concat([first_order_output, second_order_output, deep_output], axis=1)
output = self.dense_output(combined_output)
return output


if __name__=='__main__':
import numpy as np
params = {
'field_size':12,
'feature_size':5+3,
'factor_size':4,
'class_num': 1,
'last_activation': 'sigmoid',
'dnn_hidden_units': (128, 128)
}
print('Generate fake data...')
x_dense = np.random.random((1000, 5))
x_sparse = np.random.randint(0, 3, (1000, 7))

# 这里x_idxs没有做更高的处理
dense_idxs = np.zeros((x_dense.shape))
for i in range(dense_idxs.shape[1]):
dense_idxs[:, i] = i
x_idxs = np.concatenate([dense_idxs, x_sparse+5], axis=1, )
x_values = np.concatenate([x_dense, np.ones(x_sparse.shape)], axis=1)

x_idxs = tf.convert_to_tensor(x_idxs, dtype=tf.int64)
x_values = tf.convert_to_tensor(x_values, dtype=tf.float32)

y = np.random.randint(0, 2, (1000, 1))

model = DeepFM(params)
pred = model(x_idxs, x_values)
print(pred.shape)