推荐广告上的模型总结

Models List

Model	Paper
Convolutional Click Prediction Model	[CIKM 2015]A Convolutional Click Prediction Model
Factorization-supported Neural Network	[ECIR 2016]Deep Learning over Multi-field Categorical Data: A Case Study on User Response Prediction
Product-based Neural Network	[ICDM 2016]Product-based neural networks for user response prediction
Wide & Deep	[DLRS 2016]Wide & Deep Learning for Recommender Systems
DeepFM	[IJCAI 2017]DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
Piece-wise Linear Model	[arxiv 2017]Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction
Deep & Cross Network	[ADKDD 2017]Deep & Cross Network for Ad Click Predictions
Attentional Factorization Machine	[IJCAI 2017]Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks
Neural Factorization Machine	[SIGIR 2017]Neural Factorization Machines for Sparse Predictive Analytics
xDeepFM	[KDD 2018]xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems
AutoInt	[arxiv 2018]AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks
Deep Interest Network	[KDD 2018]Deep Interest Network for Click-Through Rate Prediction
Deep Interest Evolution Network	[AAAI 2019]Deep Interest Evolution Network for Click-Through Rate Prediction
ONN	[arxiv 2019]Operation-aware Neural Networks for User Response Prediction
FGCNN	[WWW 2019]Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction
Deep Session Interest Network	[IJCAI 2019]Deep Session Interest Network for Click-Through Rate Prediction
FiBiNET	[RecSys 2019]FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction
FLEN	[arxiv 2019]FLEN: Leveraging Field for Scalable CTR Prediction

done

[X] DeepFM

DeepFM

blog
github

整体网络结构

Deep & Cross Network

（实用性不高）

整体网络结构

Visualization of a cross layer

XDeepFM

整体网络结构

CIN部分结构

CIN输入输出：

输入：x0(n, m, d)，n为batch_size, m为离散特征个数，d为embedding维度
假设第k层为xk(n, h(k+1), d)，则第k+1层计算过程为：
- x0和xk在axis为1维度上两两 Hadamard product，得到(n, d, m*h(k))
- 接着进行在axis为2的维度上价钱求和得到(n, d)，这样的操作做h(k+1)次得到(n, d, h(k+1))，交换axis1,2，得到(n, h(k+1), d)
输出：K个(n, h(i), d), i属于[1, K]，在axis=1的维度拼接，在axis=2的维度进行sum pooling，得到(n, h(out))，其中h(out)为h(1)+h(2)+…+h(K)

DIN（Deep Interest Network）

整体网络结构

输入数据统计表

其中multi-hot在一个样本里是一个点击id的sequence

输入分四个部分：

User Profile Features:
(n, f1),进行embedding后得到(n, f1, d)，在按照embedding维度concat为，V(UPF)(n, f1*d)
User Behavior Features:
用户历史点击商品序列，以及商品对应的类别或者其他属性序列，以上图为例子，就是有三个历史点击序列，visited_goods_ids, visited_shop_ids, visited_cate_ids；首先对点击序列进行padding到最长max_len，然后得到embedding表达，最后按照embedding维度进行拼接；假设d1,d2,d3分别表示这三种id的embedding size，则最后得到的是，V(U)(n, max_len, d1+d2+d3)。对应网络结构图的话，Goods1是点击的第一个，Goods2是点击的第二个…Goods N是点击的第N个，这个N对应max_len。传统的可能就按照max_len维度进行sum pooling了，但是这样体现不出跟ad的交互，所以后面会用Ad的embedding向量表示作为query，对user behaviors的(n, max_len, d1+d2+d3)进行attention，得到max_len中每个点击的score，在加权sum pooling。
Ad Features：
同上面的User Behavior Features，不同的是每个Ad是one-hot，这些goods_id, shop_id，cate_id同上面的User Behavior Features里的id是在一个embedding空间里。最后得到的embedding表示为：V(A)(n, 1, d1+d2+d3)
Context Features：
同User Profile Features一样，embedding后concat，得到, V(CF)(n, f2*d)

Activation Unit:

首先对U(A)repeat到，V(A)(n, max_len, d1+d2+d3)，在将[V(U), V(A), V(U)*V(A)]按照embedding维度凭借起来得到，(n, max_len, 3*(d1+d2+d3)), 在连接几个全连接层最后输出，(n, max_len, 1)，得到max_len维度的attention score，在对V(U)进行加权求和得到，V(U)(n, d1+d2+d3),

输出：

concat[V(UPF), V(U), V(A), V(CF)]，在连接几个全连接层，得到最终的输出。