推荐广告上的模型总结

Models List

Model Paper
Convolutional Click Prediction Model [CIKM 2015]A Convolutional Click Prediction Model
Factorization-supported Neural Network [ECIR 2016]Deep Learning over Multi-field Categorical Data: A Case Study on User Response Prediction
Product-based Neural Network [ICDM 2016]Product-based neural networks for user response prediction
Wide & Deep [DLRS 2016]Wide & Deep Learning for Recommender Systems
DeepFM [IJCAI 2017]DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
Piece-wise Linear Model [arxiv 2017]Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction
Deep & Cross Network [ADKDD 2017]Deep & Cross Network for Ad Click Predictions
Attentional Factorization Machine [IJCAI 2017]Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks
Neural Factorization Machine [SIGIR 2017]Neural Factorization Machines for Sparse Predictive Analytics
xDeepFM [KDD 2018]xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems
AutoInt [arxiv 2018]AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks
Deep Interest Network [KDD 2018]Deep Interest Network for Click-Through Rate Prediction
Deep Interest Evolution Network [AAAI 2019]Deep Interest Evolution Network for Click-Through Rate Prediction
ONN [arxiv 2019]Operation-aware Neural Networks for User Response Prediction
FGCNN [WWW 2019]Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction
Deep Session Interest Network [IJCAI 2019]Deep Session Interest Network for Click-Through Rate Prediction
FiBiNET [RecSys 2019]FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction
FLEN [arxiv 2019]FLEN: Leveraging Field for Scalable CTR Prediction

done

  • [X] DeepFM

DeepFM

blog
github

整体网络结构

Deep & Cross Network

(实用性不高)

整体网络结构

Visualization of a cross layer

XDeepFM

整体网络结构

CIN部分结构

CIN输入输出:

  1. 输入:x0(n, m, d),n为batch_size, m为离散特征个数,d为embedding维度
  2. 假设第k层为xk(n, h(k+1), d),则第k+1层计算过程为:
    • x0和xk在axis为1维度上两两 Hadamard product,得到(n, d, m*h(k))
    • 接着进行在axis为2的维度上价钱求和得到(n, d),这样的操作做h(k+1)次得到(n, d, h(k+1)),交换axis1,2,得到(n, h(k+1), d)
  3. 输出:K个(n, h(i), d), i属于[1, K],在axis=1的维度拼接,在axis=2的维度进行sum pooling,得到(n, h(out)), 其中h(out)为h(1)+h(2)+…+h(K)

DIN(Deep Interest Network)

整体网络结构

输入数据统计表

其中multi-hot在一个样本里是一个点击id的sequence

输入分四个部分:

  1. User Profile Features:
    (n, f1),进行embedding后得到(n, f1, d),在按照embedding维度concat为,V(UPF)(n, f1*d)
  2. User Behavior Features:
    用户历史点击商品序列,以及商品对应的类别或者其他属性序列,以上图为例子,就是有三个历史点击序列,visited_goods_ids, visited_shop_ids, visited_cate_ids;首先对点击序列进行padding到最长max_len,然后得到embedding表达,最后按照embedding维度进行拼接;假设d1,d2,d3分别表示这三种id的embedding size,则最后得到的是,V(U)(n, max_len, d1+d2+d3)。对应网络结构图的话,Goods1是点击的第一个,Goods2是点击的第二个…Goods N是点击的第N个,这个N对应max_len。传统的可能就按照max_len维度进行sum pooling了,但是这样体现不出跟ad的交互,所以后面会用Ad的embedding向量表示作为query,对user behaviors的(n, max_len, d1+d2+d3)进行attention,得到max_len中每个点击的score,在加权sum pooling。
  3. Ad Features:
    同上面的User Behavior Features,不同的是每个Ad是one-hot,这些goods_id, shop_id,cate_id同上面的User Behavior Features里的id是在一个embedding空间里。最后得到的embedding表示为:V(A)(n, 1, d1+d2+d3)
  4. Context Features:
    同User Profile Features一样,embedding后concat,得到, V(CF)(n, f2*d)

Activation Unit:

首先对U(A)repeat到,V(A)(n, max_len, d1+d2+d3),在将[V(U), V(A), V(U)*V(A)]按照embedding维度凭借起来得到,(n, max_len, 3*(d1+d2+d3)), 在连接几个全连接层最后输出,(n, max_len, 1),得到max_len维度的attention score,在对V(U)进行加权求和得到,V(U)(n, d1+d2+d3),

输出:

concat[V(UPF), V(U), V(A), V(CF)],在连接几个全连接层,得到最终的输出。