Inductive Representation Learning on Large Graphs

论文链接：Inductive Representation Learning on Large Graphs

Abstract 摘要

Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in thegraph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node,we learn a function that generates embeddings by sampling and aggregating features from a node’s local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions.

大型图中节点的低维嵌入在各种预测任务中都被证明是非常有用的，从内容推荐到识别蛋白质功能。然而，大多数现有的方法都要求在嵌入训练期间图中的所有节点都存在；这些以前的方法是固有的直推的，不能自然地推广到看不见的节点。在这里，我们展示了GraphSAGE，这是一种利用节点特征信息（例如，文本属性）高效地为以前看不到的数据生成节点嵌入的通用归纳式的框架。我们学习了一个函数，该函数通过从节点的局部邻域采样和聚合特征来生成嵌入，而不是为每个节点训练单独的嵌入。在三个归纳节点分类基准上，我们的算法优于强基线：我们基于citation和Reddit post data对进化信息图中的未知节点类别进行分类，使用蛋白质-蛋白质相互作用的多图数据集，表明我们的算法推广到完全未知的图。

　Ｎotes 笔记

要解决的问题：

大多数工作是直推式（transductive）的学习，在固定的图上根据图的拓扑结构、节点的属性信息等来生成节点的embedding，当遇到在训练中没有见到的节点时，很难快速，为其生成对应的embedding，也就是说不能将学习到的东西推广到不可见节点上。而且它的通用性也不高，我们当然希望在一个图上学习到的东西，可以使用在与它相似的其他图上。

解决的思路：

提出一种归纳式（inductive）的学习框架GraphSAGE，不是去学习每个节点的embedding，而是去学习如何在节点的邻域上聚合特征信息，也就是去训练一个聚合器函数，授人以鱼不如授人以渔。

前向传播算法

在循环的第k步，每个节点（以v为例），都将聚合它的邻居节点N(v)在k-1层的表示，与v自身在k-1层的表示拼接后经过带有非线性激活函数的全连接层，生成v在k层的表示。

更直观地来讲，每经过一层，聚合特征信息的邻域范围就扩大一圈。

GraphSAGE的全称是Graph SAmple and aggreGatE，图采样和聚合。这里的采样是指将N(v)只随机选取v邻居节点的固定大小的子集，这样可以保证每个训练批次的内存和耗时是可以预测的。

学习GraphSAGE的参数

无监督学习情况下，希望损失函数能做到：邻近节点的具有相似的表示，而分离节点的表示高度不同。本文应用如下损失函数：

$J_{\mathcal{G}}\left(\mathbf{z}_{u}\right)=-\log \left(\sigma\left(\mathbf{z}_{u}^{\top} \mathbf{z}_{v}\right)\right)-Q \cdot \mathbb{E}_{v_{n} \sim P_{n}(v)} \log \left(\sigma\left(-\mathbf{z}_{u}^{\top} \mathbf{z}_{v_{n}}\right)\right)$

v是通过固定长度的随机游走共同出现在u附近的顶点，σ是sigmoid函数，Pn是负采样分布，Q是负采样数。

十分重要地是，与以前的嵌入方法不同，我们提供给这个损失函数的表示是从节点的局部邻域中包含的特征生成的，而不是为每个节点训练一个独立的嵌入。

而对于监督学习，简单地根据目标任务来选取损失函数即可（如交叉熵损失函数）。

聚合函数

图上的节点的邻居天然是无序的，因此聚合函数最好是对称的，使得训练与输入顺序无关。同时还需要有较好的表达能力。本文作者给出了三种聚合函数：

Mean aggregator
$\mathbf{h}_{v}^{k} \leftarrow \sigma\left(\mathbf{W} \cdot \operatorname{MEAN}\left(\left\{\mathbf{h}_{v}^{k-1}\right\} \cup\left\{\mathbf{h}_{u}^{k-1}, \forall u \in \mathcal{N}(v)\right\}\right)\right.$
这一步就代替了上图算法中第4行和第5行的操作，直接将v的k-1层表示和其邻居的k-1层表示求均值，然后经过带有非线性激活函数的全连接层，生成v的k层表示。
LSTM aggregator

表达能力更强，但不是天然对称的，需要人为将邻居节点随机排列。
Pooling aggregator
$\text { AGGREGATE }_{k}^{\text {pool }}=\max \left(\left\{\sigma\left(\mathbf{W}_{\text {pool }} \mathbf{h}_{u_{i}}^{k}+\mathbf{b}\right), \forall u_{i} \in \mathcal{N}(v)\right\}\right)$
这一步代替上图算法中的第4行。这里以最大池化为例，也可选用其他。

Conclusion 结论

We introduced a novel approach that allows embeddings to be efficiently generated for unseen nodes. GraphSAGE consistently outperforms state-of-the-art baselines, effectively trades off performance and runtime by sampling node neighborhoods, and our theoretical analysis provides insight into how our approach can learn about local graph structures. A number of extensions and potential improvements are possible, such as extending GraphSAGE to incorporate directed or multi-modal graphs. A particularly interesting direction for future work is exploring non-uniform neighborhood sampling functions, and perhaps even learning these functions as part of the GraphSAGE optimization.

我们引入了一种新的方法，可以有效地为看不见的节点生成嵌入。GraphSAGE始终优于最先进的基线，通过采样节点邻域有效地权衡性能和运行时，我们的理论分析为我们的方法如何了解局部图结构提供了见解。有许多扩展和潜在的改进是可能的，例如扩展GraphSAGE以合并有向图或多模态图。未来工作的一个特别有趣的方向是探索非均匀邻域采样函数，甚至可能学习这些函数作为图形图像优化的一部分。

如果笔记中有错误，敬请指出；如果您有不同的理解，欢迎讨论。