SlideShare a Scribd company logo
Scalable Global Alignment Graph Kernel Using
Random Features: From Node Embedding to
Graph Embedding
KDD2019
Lingfei Wu, Ian En-Hsu Yen, Zhen Zhang †, Kun Xu, Liang Zhao, Xi
Peng, Yinglong Xia, Charu Aggarwal
Presenter: Hagawa, Nishi, Eugene
2019.11.11
1 / 35
Problem Setup
Goal:
▶ Create a good kernel to measure Graph similarity
▶ Less computational complexity
▶ Take into account global and local graph property
▶ Have positive definite
▶ Leads to good classifier
Application:
▶ Kernel SVM (input: graph,
output: binary)
▶ Kernel PCA
▶ Kernel Ridge Regression
▶ . . .
How similar?
𝑘( ) = 0.5,
2 / 35
Difficulty : Graph isomorphism
difficulty to define similarity between graphs
▶ 2 graphs : G1(V1, E1, ℓ1, L1), G2(V2, E2, ℓ2, L2)
▶ Bijection1 f exists, if and only if, G1 is isomorphism with G2
▶ Bijection f : V1 → V2 s.t {va, vb} ∈ E1, va and vb are adjacent.
▶ Partial isomorphism is NP-complete
1
全単射
3 / 35
Related Work
2 groups of recent graph kernel method
Comparing sub-structure:
▶ The major difference is how to define and explore sub-structures
- random walks, shortest paths, cycles, subtree patterns, graphlets...
Geometric node embeddings:
▶ Capture global property
▶ Achieved state-of-the-art performance in the graph classification task
Bad points of related works
Comparing sub-structure:
▶ Do not take into account the global property
Geometric node embeddings:
▶ Do not necessarily use positive definite for Kernel
Poor scalability:
4 / 35
Contribution
▶ Propose a Positive definite Kernel
▶ Reduce computational complexity
▶ From quadratic to (quasi-)linear 2
▶ Propose an approximation of the kernel with convergence analysis
▶ Take into account global property
▶ Outperforms 12 state-of-the-art graph classification algorithms
- Include graph kernels, deep graph neural networks
2
quasi-linear : n log n. Time and Space.
5 / 35
Common kernel
Compare directly 2 graphs using kernel
Similarity
𝒌(・, ・)
Figure: calculation of kernel value between 2 graphs
6 / 35
Proposed kernel
Compare directly 2 graphs using kernel
Similarity
𝒌(・, ・)
Random Graphs
Similarity
with	𝒌(・, ・)
Figure: calculation of kernel value between 2 graphs
7 / 35
Notation : Graph definition
Graph: G = (V , E, ℓ)
Node: V = {vi }n
i=1
Edge: E = (V × V )
Assign label function: ℓ : V → Σ
Size of node: n
# of edge: m
Node label: l
# of graphs: N
G<latexit sha1_base64="QLLEFqFGXJzmcwbhRTcNSo8/+r8=">AAAB6HicbVDLSsNAFJ3UV62vqks3g0VwVRItPnZFF7pswT6gDWUyvWnHTiZhZiKU0C9w40IRt36SO//GSRpErQcuHM65l3vv8SLOlLbtT6uwtLyyulZcL21sbm3vlHf32iqMJYUWDXkoux5RwJmAlmaaQzeSQAKPQ8ebXKd+5wGkYqG409MI3ICMBPMZJdpIzZtBuWJX7Qx4kTg5qaAcjUH5oz8MaRyA0JQTpXqOHWk3IVIzymFW6scKIkInZAQ9QwUJQLlJdugMHxlliP1QmhIaZ+rPiYQESk0Dz3QGRI/VXy8V//N6sfYv3ISJKNYg6HyRH3OsQ5x+jYdMAtV8agihkplbMR0TSag22ZSyEC5TnH2/vEjaJ1XntFpr1ir1qzyOIjpAh+gYOegc1dEtaqAWogjQI3pGL9a99WS9Wm/z1oKVz+yjX7DevwC1D40D</latexit>
v1<latexit sha1_base64="6r48FeRijmeRwM0ce/9YOgxnVX0=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0+HErevFYwbSFNpTNdtMu3WzC7qZQQn+DFw+KePUHefPfuEmDqPXBwOO9GWbm+TFnStv2p1VaWV1b3yhvVra2d3b3qvsHbRUlklCXRDySXR8rypmgrmaa024sKQ59Tjv+5DbzO1MqFYvEg57F1AvxSLCAEayN5E4HqTMfVGt23c6BlolTkBoUaA2qH/1hRJKQCk04Vqrn2LH2Uiw1I5zOK/1E0RiTCR7RnqECh1R5aX7sHJ0YZYiCSJoSGuXqz4kUh0rNQt90hliP1V8vE//zeokOrryUiTjRVJDFoiDhSEco+xwNmaRE85khmEhmbkVkjCUm2uRTyUO4znDx/fIyaZ/VnfN6475Ra94UcZThCI7hFBy4hCbcQQtcIMDgEZ7hxRLWk/VqvS1aS1Yxcwi/YL1/AeZQjuI=</latexit>
v2<latexit sha1_base64="HvFip7AjDkPR91+3+J6CugKM0SQ=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KkktftyKXjxWMG2hDWWz3bRLN5uwuymU0N/gxYMiXv1B3vw3btIgan0w8Hhvhpl5fsyZ0rb9aa2srq1vbJa2yts7u3v7lYPDtooSSahLIh7Jro8V5UxQVzPNaTeWFIc+px1/cpv5nSmVikXiQc9i6oV4JFjACNZGcqeDtD4fVKp2zc6BlolTkCoUaA0qH/1hRJKQCk04Vqrn2LH2Uiw1I5zOy/1E0RiTCR7RnqECh1R5aX7sHJ0aZYiCSJoSGuXqz4kUh0rNQt90hliP1V8vE//zeokOrryUiTjRVJDFoiDhSEco+xwNmaRE85khmEhmbkVkjCUm2uRTzkO4znDx/fIyaddrznmtcd+oNm+KOEpwDCdwBg5cQhPuoAUuEGDwCM/wYgnryXq13hatK1YxcwS/YL1/AefVjuM=</latexit>
v3<latexit sha1_base64="+XpoULfOHqCHvyZwfk/DV8G7sg0=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0+HErevFYwbSFNpTNdtMu3WzC7qZQQn+DFw+KePUHefPfuEmDqPXBwOO9GWbm+TFnStv2p1VaWV1b3yhvVra2d3b3qvsHbRUlklCXRDySXR8rypmgrmaa024sKQ59Tjv+5DbzO1MqFYvEg57F1AvxSLCAEayN5E4H6fl8UK3ZdTsHWiZOQWpQoDWofvSHEUlCKjThWKmeY8faS7HUjHA6r/QTRWNMJnhEe4YKHFLlpfmxc3RilCEKImlKaJSrPydSHCo1C33TGWI9Vn+9TPzP6yU6uPJSJuJEU0EWi4KEIx2h7HM0ZJISzWeGYCKZuRWRMZaYaJNPJQ/hOsPF98vLpH1Wd87rjftGrXlTxFGGIziGU3DgEppwBy1wgQCDR3iGF0tYT9ar9bZoLVnFzCH8gvX+BelajuQ=</latexit>
V = {v1, v2, v3}<latexit sha1_base64="5/LCIMtGZ5h5wVQCMmTMg/5dkCc=">AAAB/3icbVDLSsNAFJ34rPUVFdy4GSyCCylJW3wshKIblxXsA5oQJtNJO3QyCTOTQold+CtuXCji1t9w5984SYuo9cBcDufcy71z/JhRqSzr01hYXFpeWS2sFdc3Nre2zZ3dlowSgUkTRywSHR9JwignTUUVI51YEBT6jLT94XXmt0dESBrxOzWOiRuiPqcBxUhpyTP3W/ASOikcefaJLpWsVJ2JZ5asspUDzhN7RkpghoZnfji9CCch4QozJGXXtmLlpkgoihmZFJ1EkhjhIeqTrqYchUS6aX7/BB5ppQeDSOjHFczVnxMpCqUch77uDJEayL9eJv7ndRMVnLsp5XGiCMfTRUHCoIpgFgbsUUGwYmNNEBZU3wrxAAmElY6smIdwkeH0+8vzpFUp29Vy7bZWql/N4iiAA3AIjoENzkAd3IAGaAIM7sEjeAYvxoPxZLwab9PWBWM2swd+wXj/AprvlA8=</latexit>
⌃ = { , }<latexit sha1_base64="ZY89SR6jHBd25PoJ2nDrWsihEs4=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0VwISXR4mMhFN24rGgf0IQymU7aoTNJmJkINbT+ihsXirj1P9z5N07SIGo9cOFwzr3ce48XMSqVZX0aM7Nz8wuLhaXi8srq2rq5sdmQYSwwqeOQhaLlIUkYDUhdUcVIKxIEcY+Rpje4TP3mHRGShsGtGkbE5agXUJ9ipLTUMbedG9rjCJ5DJxmPxwe6nFHHLFllKwOcJnZOSiBHrWN+ON0Qx5wECjMkZdu2IuUmSCiKGRkVnViSCOEB6pG2pgHiRLpJdv0I7mmlC/1Q6AoUzNSfEwniUg65pzs5Un3510vF/7x2rPxTN6FBFCsS4MkiP2ZQhTCNAnapIFixoSYIC6pvhbiPBMJKB1bMQjhLcfz98jRpHJbto3LlulKqXuRxFMAO2AX7wAYnoAquQA3UAQb34BE8gxfjwXgyXo23SeuMkc9sgV8w3r8ASdWVRQ==</latexit>
8 / 35
Notation
Set of graphs: G = {Gi }N
i=1
Set of graph lebels: Y = {Yi }N
i=1
Set of geometric embeddings (each graph): U = {ui }n
i=1 ∈ Rn×d
Latent node embedding space (each node): u ∈ Rd
𝐺" ・・・
𝑁
𝑛Latent node ↑
embedding
Node size→
# of graphs→
𝑌"Graph label→ 𝑌&
𝐺&
u1 2 Rd
<latexit sha1_base64="FiX+xGGr4lrH54q+qBxUWlkIUrA=">AAACDnicbVDLSsNAFJ3UV62vqEs3g6XgqiRafOyKblxWsQ9oYphMJu3QySTMTIQS8gVu/BU3LhRx69qdf2OSBlHrgQuHc+7l3nvciFGpDONTqywsLi2vVFdra+sbm1v69k5PhrHApItDFoqBiyRhlJOuooqRQSQIClxG+u7kIvf7d0RIGvIbNY2IHaARpz7FSGWSozcsN2RegNQ4iVMnMVNoUQ6tXBBBcp3eJl4KoaPXjaZRAM4TsyR1UKLj6B+WF+I4IFxhhqQcmkak7AQJRTEjac2KJYkQnqARGWaUo4BIOyneSWEjUzzohyIrrmCh/pxIUCDlNHCzzvxO+dfLxf+8Yaz8UzuhPIoV4Xi2yI8ZVCHMs4EeFQQrNs0IwoJmt0I8RgJhlSVYK0I4y3H8/fI86R02zaNm66pVb5+XcVTBHtgHB8AEJ6ANLkEHdAEG9+ARPIMX7UF70l61t1lrRStndsEvaO9fjuucjQ==</latexit>
u2<latexit sha1_base64="Pm48/PPv93nEVMYQDi7yld7eDYw=">AAAB+XicbVDLSsNAFJ34rPUVdelmsAiuSlKLj13RjcsK9gFtCJPJpB06mQkzk0IJ/RM3LhRx65+482+cpEHUemDgcM693DMnSBhV2nE+rZXVtfWNzcpWdXtnd2/fPjjsKpFKTDpYMCH7AVKEUU46mmpG+okkKA4Y6QWT29zvTYlUVPAHPUuIF6MRpxHFSBvJt+1hIFgYIz3O0rmfNea+XXPqTgG4TNyS1ECJtm9/DEOB05hwjRlSauA6ifYyJDXFjMyrw1SRBOEJGpGBoRzFRHlZkXwOT40SwkhI87iGhfpzI0OxUrM4MJN5RvXXy8X/vEGqoysvozxJNeF4cShKGdQC5jXAkEqCNZsZgrCkJivEYyQR1qasalHCdY6L7y8vk26j7p7Xm/fNWuumrKMCjsEJOAMuuAQtcAfaoAMwmIJH8AxerMx6sl6tt8XoilXuHIFfsN6/ACNylCA=</latexit>
u3<latexit sha1_base64="w2BS8kqWqIp26xG7B4vB81cBpaY=">AAAB+XicbVDLSsNAFJ3UV62vqEs3g0VwVRItPnZFNy4r2Ae0IUwmk3boZCbMTAol9E/cuFDErX/izr9xkgZR64GBwzn3cs+cIGFUacf5tCorq2vrG9XN2tb2zu6evX/QVSKVmHSwYEL2A6QIo5x0NNWM9BNJUBww0gsmt7nfmxKpqOAPepYQL0YjTiOKkTaSb9vDQLAwRnqcpXM/O5/7dt1pOAXgMnFLUgcl2r79MQwFTmPCNWZIqYHrJNrLkNQUMzKvDVNFEoQnaEQGhnIUE+VlRfI5PDFKCCMhzeMaFurPjQzFSs3iwEzmGdVfLxf/8wapjq68jPIk1YTjxaEoZVALmNcAQyoJ1mxmCMKSmqwQj5FEWJuyakUJ1zkuvr+8TLpnDfe80bxv1ls3ZR1VcASOwSlwwSVogTvQBh2AwRQ8gmfwYmXWk/VqvS1GK1a5cwh+wXr/AiT3lCE=</latexit>
9 / 35
Geometric Embeddings
Use partial eigendecomposition 3 to extract node embeddings:
1. Create normalized Laplacian matrix L ∈ Rn×n
2. Do partial eigendecomposition and obtaining U
3. Use the smallest d eigenvectors
Normalized
Laplacian matrix
→
Partial
Eigendecomposition
𝑈Λ𝑈#
𝑛×𝑑		𝑑×𝑑		𝑑×𝑛
The smallest 𝑑 eigenvectors
L𝑛×𝑛
A
B C
A B C
A 0 1 1
B 1 0 0
C 1 0 0
Adjacency matrix
A B C
A 2 0 0
B 0 1 0
C 0 0 1
Degree matrix
A B C
A 2 -1 -1
B -1 1 0
C -1 0 1
Laplacian matrix
-=
Normalize
Figure: Example obtaining U
3
Time complexity: Linear (# of graph edge) (...I don’t know how.)
10 / 35
Transportation Distance [1]
Earth Mover’s Distance (EMD): measure of dissimilarity
EMD (Gx , Gy ) := min
T ∈R
nx ×ny
+
⟨D, T ⟩
s.t.T 1 = t(Gx )
, T T
1 = t(Gy )
▶ Linear programming problem
▶ Flow matrix T
- Tij : how much of vi in Gx travels to vj in Gy
▶ GX → UX = {ux
1, ux
2, · · · , ux
nx
}
▶ GY → UY = {uy
1, uy
2, · · · , uy
ny }
▶ Transport cost matrix D
- Dij = ∥ux
i − uy
j ∥2
11 / 35
Transportation Distance [1]
Earth Mover’s Distance (EMD): measure of dissimilarity
EMD (Gx , Gy ) := min
T ∈R
nx ×ny
+
⟨D, T ⟩
s.t.T 1 = t(Gx )
, T T
1 = t(Gy )
▶ Node vi has ci outgoing edges
▶ Normalized bog-of-words (nBOW): ti = ci /
∑n
j=1 cj ∈ R
12 / 35
Transportation Distance: Example
AB
C
ab
c
A
B
C
a
b
c
a b c
A
B
C
A
B
C
a b c
Figure: EMD example
▶ EMD focus on node size and outgoing edges of each graph
13 / 35
Straightforward way to define kernel, It’s high cost
EDM based Kernel = −
1
2
JDemd J
J = I −
1
N
11⊤
▶ Not necessarily positive definite
▶ Time complexity:O(N2n3log(n)), Space complexity :O(N2)
Graph A Graph CGraph B
A B C
A EMD(A,A) EMD(A,B) EMD(A,C)
B EMD(B,A) EMD(B,B) EMD(B,C)
C EMD(C,A) EMD(C,B) EMD(C,C)
Distance
Matrix
Demd<latexit sha1_base64="/2TlWLrs+SXNjZgq6EbvOI79l2o=">AAAB7nicbVDLSsNAFJ3UV62vqks3g0VwVRItPnZFXbisYB/QhjKZ3LRDJ5MwMxFK6Ee4caGIW7/HnX/jJA2i1gMXDufcy733eDFnStv2p1VaWl5ZXSuvVzY2t7Z3qrt7HRUlkkKbRjySPY8o4ExAWzPNoRdLIKHHoetNrjO/+wBSsUjc62kMbkhGggWMEm2k7s0whdCfDas1u27nwIvEKUgNFWgNqx8DP6JJCEJTTpTqO3as3ZRIzSiHWWWQKIgJnZAR9A0VJATlpvm5M3xkFB8HkTQlNM7VnxMpCZWahp7pDIkeq79eJv7n9RMdXLgpE3GiQdD5oiDhWEc4+x37TALVfGoIoZKZWzEdE0moNglV8hAuM5x9v7xIOid157TeuGvUmldFHGV0gA7RMXLQOWqiW9RCbUTRBD2iZ/RixdaT9Wq9zVtLVjGzj37Bev8Cc8SPyQ==</latexit>
Figure: Straightforward kernel based on EMD
14 / 35
Global Alignment Graph Kernel
Using EMD and Random feature (RF)
Proposed Kernel: 4
k (Gx , Gy ) :=
∫
p (Gω) ϕGω (Gx ) dGω
where ϕGω
:= exp (−γEMD(Gx , Gω))
▶ Gω : random graph
▶ W = {wi }D
i=1
▶ wi is sampled from V ∈ Rd
▶ p(Gω) is a distribution over the space of all random graphs of variable
sizes Ω := ∪Dmax
D=1VD
4
ランダムグラフの詳細に踏み込もうと思ったが, 非常に込み入った話でハガワは挫折
した. 気になる方はこちらを参照. 確率なんもワカンネ.
15 / 35
Global Alignment Graph Kernel Using EMD and RF
Approximation5:
˜k (Gx , Gy ) =
1
R
R∑
i=1
ϕGωi (Gx ) ϕGωi (Gy )
→ k (Gx , Gy ) , as R → ∞
𝜙"#
(𝐺&)
Random Graphs
𝐺(
𝐺&
𝜙"#
(𝐺))
𝐺)
5
the uniform convergence of approximate proposed kernel
16 / 35
Algorithm
Set Data and hyperparameters
▶ Node embedding size (dimension): d
▶ Max size of random graphs: Dmax
▶ Graph embedding size: R
𝜙"#
(𝐺&)
Random Graphs
𝐺(
𝐺&
𝜙"#
(𝐺))
𝐺)
DataGraphs
𝑅𝐷,-&
𝑑
Algorithm 1 Random Graph Embedding
Input: Data graphs {Gi }N
i=1, node embedding size d, maximum
size of random graphs Dmax , graph embedding size R.
Output: Feature matrix ZN ⇥R for data graphs
1: Compute nBOW weights vectors {t(Gi )}N
i=1 of the normalized
Laplacian L of all graphs
2: Obtain node embedding vectors {ui }n
i=1 by computing d small-
est eigenvectors of L
3: for j = 1, . . . ,R do
4: Draw Dj uniformly from [1, Dmax ].
5: Generate a random graph G j with Dj number of nodes
embeddings W from Algorithm 2.
6: Compute a feature vector Zj = G j
({Gi }N
i=1)) using EMD
or other optimal transportation distance in Equation (3).
7: end for
8: Return feature matrix Z({Gi }N
i=1) = 1p
R
{Zi }R
i=1
17 / 35
Compute {t(Gt )}N
i=1 and Laplacian matrix L
A
B C
A B C
A 2 -1 -1
B -1 1 0
C -1 0 1
Laplacian matrix
𝒕(𝑮 𝒙)
½
¼
¼
→For All Graphs
Algorithm 1 Random Graph Embedding
Input: Data graphs {Gi }N
i=1, node embedding size d, maximum
size of random graphs Dmax , graph embedding size R.
Output: Feature matrix ZN ⇥R for data graphs
1: Compute nBOW weights vectors {t(Gi )}N
i=1 of the normalized
Laplacian L of all graphs
2: Obtain node embedding vectors {ui }n
i=1 by computing d small-
est eigenvectors of L
3: for j = 1, . . . ,R do
4: Draw Dj uniformly from [1, Dmax ].
5: Generate a random graph G j with Dj number of nodes
embeddings W from Algorithm 2.
6: Compute a feature vector Zj = G j
({Gi }N
i=1)) using EMD
or other optimal transportation distance in Equation (3).
7: end for
8: Return feature matrix Z({Gi }N
i=1) = 1p
R
{Zi }R
i=1
18 / 35
Obtain node embedding vectors
Normalized
Laplacian matrix
→
Partial
Eigendecomposition
𝑈Λ𝑈#
𝑛×𝑑		𝑑×𝑑		𝑑×𝑛
The smallest 𝑑 eigenvectors
L𝑛×𝑛
→For All Graphs
𝐺"
u1 2 Rd
<latexit sha1_base64="FiX+xGGr4lrH54q+qBxUWlkIUrA=">AAACDnicbVDLSsNAFJ3UV62vqEs3g6XgqiRafOyKblxWsQ9oYphMJu3QySTMTIQS8gVu/BU3LhRx69qdf2OSBlHrgQuHc+7l3nvciFGpDONTqywsLi2vVFdra+sbm1v69k5PhrHApItDFoqBiyRhlJOuooqRQSQIClxG+u7kIvf7d0RIGvIbNY2IHaARpz7FSGWSozcsN2RegNQ4iVMnMVNoUQ6tXBBBcp3eJl4KoaPXjaZRAM4TsyR1UKLj6B+WF+I4IFxhhqQcmkak7AQJRTEjac2KJYkQnqARGWaUo4BIOyneSWEjUzzohyIrrmCh/pxIUCDlNHCzzvxO+dfLxf+8Yaz8UzuhPIoV4Xi2yI8ZVCHMs4EeFQQrNs0IwoJmt0I8RgJhlSVYK0I4y3H8/fI86R02zaNm66pVb5+XcVTBHtgHB8AEJ6ANLkEHdAEG9+ARPIMX7UF70l61t1lrRStndsEvaO9fjuucjQ==</latexit>
u2<latexit sha1_base64="Pm48/PPv93nEVMYQDi7yld7eDYw=">AAAB+XicbVDLSsNAFJ34rPUVdelmsAiuSlKLj13RjcsK9gFtCJPJpB06mQkzk0IJ/RM3LhRx65+482+cpEHUemDgcM693DMnSBhV2nE+rZXVtfWNzcpWdXtnd2/fPjjsKpFKTDpYMCH7AVKEUU46mmpG+okkKA4Y6QWT29zvTYlUVPAHPUuIF6MRpxHFSBvJt+1hIFgYIz3O0rmfNea+XXPqTgG4TNyS1ECJtm9/DEOB05hwjRlSauA6ifYyJDXFjMyrw1SRBOEJGpGBoRzFRHlZkXwOT40SwkhI87iGhfpzI0OxUrM4MJN5RvXXy8X/vEGqoysvozxJNeF4cShKGdQC5jXAkEqCNZsZgrCkJivEYyQR1qasalHCdY6L7y8vk26j7p7Xm/fNWuumrKMCjsEJOAMuuAQtcAfaoAMwmIJH8AxerMx6sl6tt8XoilXuHIFfsN6/ACNylCA=</latexit>
u3<latexit sha1_base64="w2BS8kqWqIp26xG7B4vB81cBpaY=">AAAB+XicbVDLSsNAFJ3UV62vqEs3g0VwVRItPnZFNy4r2Ae0IUwmk3boZCbMTAol9E/cuFDErX/izr9xkgZR64GBwzn3cs+cIGFUacf5tCorq2vrG9XN2tb2zu6evX/QVSKVmHSwYEL2A6QIo5x0NNWM9BNJUBww0gsmt7nfmxKpqOAPepYQL0YjTiOKkTaSb9vDQLAwRnqcpXM/O5/7dt1pOAXgMnFLUgcl2r79MQwFTmPCNWZIqYHrJNrLkNQUMzKvDVNFEoQnaEQGhnIUE+VlRfI5PDFKCCMhzeMaFurPjQzFSs3iwEzmGdVfLxf/8wapjq68jPIk1YTjxaEoZVALmNcAQyoJ1mxmCMKSmqwQj5FEWJuyakUJ1zkuvr+8TLpnDfe80bxv1ls3ZR1VcASOwSlwwSVogTvQBh2AwRQ8gmfwYmXWk/VqvS1GK1a5cwh+wXr/AiT3lCE=</latexit>
Algorithm 1 Random Graph Embedding
Input: Data graphs {Gi }N
i=1, node embedding size d, maximum
size of random graphs Dmax , graph embedding size R.
Output: Feature matrix ZN ⇥R for data graphs
1: Compute nBOW weights vectors {t(Gi )}N
i=1 of the normalized
Laplacian L of all graphs
2: Obtain node embedding vectors {ui }n
i=1 by computing d small-
est eigenvectors of L
3: for j = 1, . . . ,R do
4: Draw Dj uniformly from [1, Dmax ].
5: Generate a random graph G j with Dj number of nodes
embeddings W from Algorithm 2.
6: Compute a feature vector Zj = G j
({Gi }N
i=1)) using EMD
or other optimal transportation distance in Equation (3).
7: end for
8: Return feature matrix Z({Gi }N
i=1) = 1p
R
{Zi }R
i=1
19 / 35
Generate random graph 6
2	 ← 𝑅𝑎𝑛𝑑(1, 𝐷,-.)
𝑈2×𝑑	
← Generate_𝑟𝑎𝑛𝑑𝑜𝑚_𝑔𝑟𝑎𝑝ℎ	(2, 𝑑)
u1 2 Rd
<latexit sha1_base64="FiX+xGGr4lrH54q+qBxUWlkIUrA=">AAACDnicbVDLSsNAFJ3UV62vqEs3g6XgqiRafOyKblxWsQ9oYphMJu3QySTMTIQS8gVu/BU3LhRx69qdf2OSBlHrgQuHc+7l3nvciFGpDONTqywsLi2vVFdra+sbm1v69k5PhrHApItDFoqBiyRhlJOuooqRQSQIClxG+u7kIvf7d0RIGvIbNY2IHaARpz7FSGWSozcsN2RegNQ4iVMnMVNoUQ6tXBBBcp3eJl4KoaPXjaZRAM4TsyR1UKLj6B+WF+I4IFxhhqQcmkak7AQJRTEjac2KJYkQnqARGWaUo4BIOyneSWEjUzzohyIrrmCh/pxIUCDlNHCzzvxO+dfLxf+8Yaz8UzuhPIoV4Xi2yI8ZVCHMs4EeFQQrNs0IwoJmt0I8RgJhlSVYK0I4y3H8/fI86R02zaNm66pVb5+XcVTBHtgHB8AEJ6ANLkEHdAEG9+ARPIMX7UF70l61t1lrRStndsEvaO9fjuucjQ==</latexit>
u2<latexit sha1_base64="Pm48/PPv93nEVMYQDi7yld7eDYw=">AAAB+XicbVDLSsNAFJ34rPUVdelmsAiuSlKLj13RjcsK9gFtCJPJpB06mQkzk0IJ/RM3LhRx65+482+cpEHUemDgcM693DMnSBhV2nE+rZXVtfWNzcpWdXtnd2/fPjjsKpFKTDpYMCH7AVKEUU46mmpG+okkKA4Y6QWT29zvTYlUVPAHPUuIF6MRpxHFSBvJt+1hIFgYIz3O0rmfNea+XXPqTgG4TNyS1ECJtm9/DEOB05hwjRlSauA6ifYyJDXFjMyrw1SRBOEJGpGBoRzFRHlZkXwOT40SwkhI87iGhfpzI0OxUrM4MJN5RvXXy8X/vEGqoysvozxJNeF4cShKGdQC5jXAkEqCNZsZgrCkJivEYyQR1qasalHCdY6L7y8vk26j7p7Xm/fNWuumrKMCjsEJOAMuuAQtcAfaoAMwmIJH8AxerMx6sl6tt8XoilXuHIFfsN6/ACNylCA=</latexit>
Figure: 2 nodes random graph
example
Algorithm 1 Random Graph Embedding
Input: Data graphs {Gi }N
i=1, node embedding size d, maximum
size of random graphs Dmax , graph embedding size R.
Output: Feature matrix ZN ⇥R for data graphs
1: Compute nBOW weights vectors {t(Gi )}N
i=1 of the normalized
Laplacian L of all graphs
2: Obtain node embedding vectors {ui }n
i=1 by computing d small-
est eigenvectors of L
3: for j = 1, . . . ,R do
4: Draw Dj uniformly from [1, Dmax ].
5: Generate a random graph G j with Dj number of nodes
embeddings W from Algorithm 2.
6: Compute a feature vector Zj = G j
({Gi }N
i=1)) using EMD
or other optimal transportation distance in Equation (3).
7: end for
8: Return feature matrix Z({Gi }N
i=1) = 1p
R
{Zi }R
i=1
6
In after section, I show 2 way to generate random graphs.
20 / 35
Compute a feature veotor Zj
𝜙"#
(𝐺&)
𝑧)=
𝑍)+
⋮
⋮
⋮
𝑍)-
Zji = ϕGω (Gi ) := exp (−γ EMD (Gi , Gω))
Algorithm 1 Random Graph Embedding
Input: Data graphs {Gi }N
i=1, node embedding size d, maximum
size of random graphs Dmax , graph embedding size R.
Output: Feature matrix ZN ⇥R for data graphs
1: Compute nBOW weights vectors {t(Gi )}N
i=1 of the normalized
Laplacian L of all graphs
2: Obtain node embedding vectors {ui }n
i=1 by computing d small-
est eigenvectors of L
3: for j = 1, . . . ,R do
4: Draw Dj uniformly from [1, Dmax ].
5: Generate a random graph G j with Dj number of nodes
embeddings W from Algorithm 2.
6: Compute a feature vector Zj = G j
({Gi }N
i=1)) using EMD
or other optimal transportation distance in Equation (3).
7: end for
8: Return feature matrix Z({Gi }N
i=1) = 1p
R
{Zi }R
i=1
21 / 35
Generate random graph for R times
𝑧"=
𝑍""
⋮
⋮
⋮
𝑍"%
𝑧&=
𝑍&"
⋮
⋮
⋮
𝑍&%
𝜙()
(𝐺,) 𝜙()
(𝐺,)
⋯
𝜙()
(𝐺,)
𝑧/=
𝑍/"
⋮
⋮
⋮
𝑍/%
⋯
Algorithm 1 Random Graph Embedding
Input: Data graphs {Gi }N
i=1, node embedding size d, maximum
size of random graphs Dmax , graph embedding size R.
Output: Feature matrix ZN ⇥R for data graphs
1: Compute nBOW weights vectors {t(Gi )}N
i=1 of the normalized
Laplacian L of all graphs
2: Obtain node embedding vectors {ui }n
i=1 by computing d small-
est eigenvectors of L
3: for j = 1, . . . ,R do
4: Draw Dj uniformly from [1, Dmax ].
5: Generate a random graph G j with Dj number of nodes
embeddings W from Algorithm 2.
6: Compute a feature vector Zj = G j
({Gi }N
i=1)) using EMD
or other optimal transportation distance in Equation (3).
7: end for
8: Return feature matrix Z({Gi }N
i=1) = 1p
R
{Zi }R
i=1
7
7
R : number of Random graphs
22 / 35
Output N × R Matrix Z
𝑍=
"
√$
𝑍""
⋮
𝑍"&
⋮
𝑍"'
⋯
	
	
	
⋯
𝑍$"
⋮
𝑍$&
⋮
𝑍$'
𝑍"&
𝑍*&
𝑍*'
⋯
	
	
	
⋯
Algorithm 1 Random Graph Embedding
Input: Data graphs {Gi }N
i=1, node embedding size d, maximum
size of random graphs Dmax , graph embedding size R.
Output: Feature matrix ZN ⇥R for data graphs
1: Compute nBOW weights vectors {t(Gi )}N
i=1 of the normalized
Laplacian L of all graphs
2: Obtain node embedding vectors {ui }n
i=1 by computing d small-
est eigenvectors of L
3: for j = 1, . . . ,R do
4: Draw Dj uniformly from [1, Dmax ].
5: Generate a random graph G j with Dj number of nodes
embeddings W from Algorithm 2.
6: Compute a feature vector Zj = G j
({Gi }N
i=1)) using EMD
or other optimal transportation distance in Equation (3).
7: end for
8: Return feature matrix Z({Gi }N
i=1) = 1p
R
{Zi }R
i=1
23 / 35
How to generate Random Graph
Data-independent and Data-dependent Distributions
Data-dependent 8
Random Graph Embedding(Anchor Sug-Graphs(ASG)):
1. Pick up Gk from data set
2. Uniformly draw Dj nodes
3. {wi }
Dj
i=1 = {un1 , un1 , · · · , unDj
}
Incorporating Label information:
▶ d(ui , uj) = max(∥ui − uj∥2,
√
d) if vi and vj have diffrent node label
▶ Make distance between different node labels
▶
√
d is largest distance in a d-dimentionnal unit hypercube space
8
data independent は appendix を参照
24 / 35
Complexity comparison (Left: Proposed, Right: Straightforward)
𝜙"#
(𝐺&)
Random Graphs
𝐺(
𝐺&
𝜙"#
(𝐺))
𝐺)
Figure: Proposed kernel
Graph A Graph CGraph B
A B C
A EMD(A,A) EMD(A,B) EMD(A,C)
B EMD(B,A) EMD(B,B) EMD(B,C)
C EMD(C,A) EMD(C,B) EMD(C,C)
Distance
Matrix
Figure: Straitforward kernel
Time complexity (dmz is partial eigendecomposition cost) 9:
▶ O(NRD2nlog(n) + dmz) ▶ O(N2n3log(n) + dmz)
※ R is # of Random Graphs, D is # of Random Graph nodes (D < n)
Space complexity:
▶ O(NR) ▶ O(N2)
9
dmz is eigendecomposition cost.
25 / 35
Experiments
Experimental setup
Machine:
▶ Use linear SVM (LIBLINEAR)
Data:
▶ 9 Datasets
Hyperparameters:
▶ γ(Kernel)→[1e-3 1e-2 1e-1 1 10]
▶ D max (Size of random graph)→[3:3:30]
▶ SVM
Evaluation:
▶ 10-fold cross-validation
▶ 10 times average accuracy
26 / 35
# of Random Graph (R) and Testing accuracy:
10
0
10
1
10
2
10
3
10
4
Varying R
15
20
25
30
35
40
45
50
TestingAccuracy%
Testing Accuracy VS R
RGE(RF)
RGE(ASG)
RGE(ASG)-NodeLab
(a) ENZYMES
10
0
10
1
10
2
10
3
10
4
Varying R
62
64
66
68
70
72
74
76
TestingAccuracy%
Testing Accuracy VS R
RGE(RF)
RGE(ASG)
RGE(ASG)-NodeLab
(b) NCI109
10
0
10
1
10
2
10
3
10
4
Varying R
55
60
65
70
75
TestingAccuracy%
Testing Accuracy VS R
RGE(RF)
RGE(ASG)
(c) IMDBBINARY
10
0
10
1
10
2
10
3
10
4
Varying R
55
60
65
70
75
80
TestingAccuracy%
Testing Accuracy VS R
RGE(RF)
RGE(ASG)
(d) COLLAB
0 500 1000 1500 2000 2500
Varying R
0
10
20
30
40
Runtime(Seconds)
Total Runtime VS R
RGE(RF)
RGE(ASG)
RGE(ASG)-NodeLab
(e) ENZYMES
0 1000 2000 3000 4000 5000
Varying R
0
100
200
300
400
500
Runtime(Seconds)
Total Runtime VS R
RGE(RF)
RGE(ASG)
RGE(ASG)-NodeLab
(f) NCI109
0 1000 2000 3000 4000 5000
Varying R
0
20
40
60
80
100
120
140
Runtime(Seconds)
Total Runtime VS R
RGE(RF)
RGE(ASG)
(g) IMDBBINARY
0 1000 2000 3000 4000 5000
Varying R
0
500
1000
1500
2000
Runtime(Seconds)
Total Runtime VS R
RGE(RF)
RGE(ASG)
(h) COLLAB
Figure 2: Test accuracies and runtime of three variants of RGE with and without node labels when varying R.
▶ Converge very rapidly when increasing R
# of Random Graph (R) and Runtime:
10
0
10
1
10
2
10
3
10
4
Varying R
15
20
25
30
35
40
45
50
TestingAccuracy%
Testing Accuracy VS R
RGE(RF)
RGE(ASG)
RGE(ASG)-NodeLab
(a) ENZYMES
10
0
10
1
10
2
10
3
10
4
Varying R
62
64
66
68
70
72
74
76
TestingAccuracy%
Testing Accuracy VS R
RGE(RF)
RGE(ASG)
RGE(ASG)-NodeLab
(b) NCI109
10
0
10
1
10
2
10
3
10
4
Varying R
55
60
65
70
75
TestingAccuracy%
Testing Accuracy VS R
RGE(RF)
RGE(ASG)
(c) IMDBBINARY
10
0
10
1
10
2
10
3
10
4
Varying R
55
60
65
70
75
80
TestingAccuracy%
Testing Accuracy VS R
RGE(RF)
RGE(ASG)
(d) COLLAB
0 500 1000 1500 2000 2500
Varying R
0
10
20
30
40
Runtime(Seconds)
Total Runtime VS R
RGE(RF)
RGE(ASG)
RGE(ASG)-NodeLab
(e) ENZYMES
0 1000 2000 3000 4000 5000
Varying R
0
100
200
300
400
500
Runtime(Seconds)
Total Runtime VS R
RGE(RF)
RGE(ASG)
RGE(ASG)-NodeLab
(f) NCI109
0 1000 2000 3000 4000 5000
Varying R
0
20
40
60
80
100
120
140
Runtime(Seconds)
Total Runtime VS R
RGE(RF)
RGE(ASG)
(g) IMDBBINARY
0 1000 2000 3000 4000 5000
Varying R
0
500
1000
1500
2000
Runtime(Seconds)
Total Runtime VS R
RGE(RF)
RGE(ASG)
(h) COLLAB
Figure 2: Test accuracies and runtime of three variants of RGE with and without node labels when varying R.
▶ Show quasi-linear scalability with respect to R
27 / 35
10
2
10
3
10
4
Varying number of graphs N
10
-2
10
0
10
2
10
4
10
6
10
8
Time(Seconds)
Runtime VS number of graphs N
RGE(Eigentime)
RGE(FeaGentime)
RGE(Runtime)
Linear
Quatratic
(a) Number of graphs N
10
2
10
3
Varying size of graph n
10
0
10
1
10
2
10
3
10
4
10
5
Time(Seconds)
Runtime VS size of graph n
RGE(Eigentime)
RGE(FeaGentime)
RGE(Runtime)
Linear
Quatratic
(b) Size of graph n
▶ shows the linear scalability with respect to N (a)
▶ shows the quasi-liniear scalability with respect to n (b)
28 / 35
classification accuracy:
Table 1: Comparison of classication accuracy against graph kernel methods without node labels.
Datasets MUTAG PTC-MR ENZYMES NCI1 NCI019
RGE(RF) 86.33 ± 1.39(1s) 59.82 ± 1.42(1s) 35.98 ± 0.89(38s) 74.70 ± 0.56(727s) 72.50 ± 0.32(865s)
RGE(ASG) 85.56 ± 0.91(2s) 59.97 ± 1.65 (1s) 38.52 ± 0.91(18s) 74.30 ± 0.45(579s) 72.70 ± 0.42(572s)
EMD 84.66 ± 2.69 (7s) 57.65 ± 0.59 (46s) 35.45 ± 0.93 (216s) 72.65 ± 0.34 (8359s) 70.84 ± 0.18 (8281s)
PM 83.83 ± 2.86 59.41 ± 0.68 28.17 ± 0.37 69.73 ± 0.11 68.37 ± 0.14
Lo- 82.58 ± 0.79 55.21 ± 0.72 26.5 ± 0.54 62.28 ± 0.34 62.52 ± 0.29
OA-E (A) 79.89 ± 0.98 56.77 ± 0.85 36.12 ± 0.81 67.99 ± 0.28 67.14 ± 0.26
RW 77.78 ± 0.98 56.18 ± 1.12 20.17 ± 0.83 56.89 ± 0.34 56.13 ± 0.31
GL 66.11 ± 1.31 57.05 ± 0.83 18.16 ± 0.47 47.37 ± 0.15 48.39 ± 0.18
SP 82.22 ± 1.14 56.18 ± 0.56 28.17 ± 0.64 62.02 ± 0.17 61.41 ± 0.32
Table 2: Comparison of classication accuracy against graph kernel methods with node labels or WL technique.
Datasets PTC-MR ENZYMES PROTEINS NCI1 NCI019
RGE(ASG) 61.5 ± 2.34(1s) 48.27 ± 0.99(28s) 75.98 ± 0.71(20s) 76.46 ± 0.45(379s) 74.42 ± 0.30(526s)
EMD 57.67 ± 2.11 (42s) 42.85 ± 0.72 (296s) 76.03 ± 0.28 (1936s) 75.89 ± 0.16 (7942s) 73.63 ± 0.33 (8073s)
PM 60.38 ± 0.86 40.33 ± 0.34 74.39 ± 0.45 72.91 ± 0.53 71.97 ± 0.15
OA-E (A) 58.76 ± 0.92 43.56 ± 0.66 — 69.83 ± 0.30 68.96 ± 0.35
V-OA 56.4 ± 1.8 35.1 ± 1.1 73.8 ± 0.5 65.6 ± 0.4 65.1 ± 0.4
RW 57.06 ± 0.86 19.33 ± 0.62 71.67 ± 0.78 63.34 ± 0.27 63.51 ± 0.18
GL 59.41 ± 0.94 32.70 ± 1.20 71.63 ± 0.33 66.00 ± 0.07 66.59 ± 0.08
SP 60.00 ± 0.72 41.68 ± 1.79 73.32 ± 0.45 73.47 ± 0.11 73.07 ± 0.11
WL-RGE(ASG) 62.20 ± 1.67(1s) 57.97 ± 1.16(38s) 76.63 ± 0.82(30s) 85.85 ± 0.42(401s) 85.32 ± 0.29(798s)
WL-ST 57.64 ± 0.68 52.22 ± 0.71 72.92 ± 0.67 82.19 ± 0.18 82.46 ± 0.24
▶ RGE is much faster than EMD
29 / 35
Table 2: Comparison of classication accuracy against graph kernel methods with node labels or WL technique.
Datasets PTC-MR ENZYMES PROTEINS NCI1 NCI019
RGE(ASG) 61.5 ± 2.34(1s) 48.27 ± 0.99(28s) 75.98 ± 0.71(20s) 76.46 ± 0.45(379s) 74.42 ± 0.30(526s)
EMD 57.67 ± 2.11 (42s) 42.85 ± 0.72 (296s) 76.03 ± 0.28 (1936s) 75.89 ± 0.16 (7942s) 73.63 ± 0.33 (8073s)
PM 60.38 ± 0.86 40.33 ± 0.34 74.39 ± 0.45 72.91 ± 0.53 71.97 ± 0.15
OA-E (A) 58.76 ± 0.92 43.56 ± 0.66 — 69.83 ± 0.30 68.96 ± 0.35
V-OA 56.4 ± 1.8 35.1 ± 1.1 73.8 ± 0.5 65.6 ± 0.4 65.1 ± 0.4
RW 57.06 ± 0.86 19.33 ± 0.62 71.67 ± 0.78 63.34 ± 0.27 63.51 ± 0.18
GL 59.41 ± 0.94 32.70 ± 1.20 71.63 ± 0.33 66.00 ± 0.07 66.59 ± 0.08
SP 60.00 ± 0.72 41.68 ± 1.79 73.32 ± 0.45 73.47 ± 0.11 73.07 ± 0.11
WL-RGE(ASG) 62.20 ± 1.67(1s) 57.97 ± 1.16(38s) 76.63 ± 0.82(30s) 85.85 ± 0.42(401s) 85.32 ± 0.29(798s)
WL-ST 57.64 ± 0.68 52.22 ± 0.71 72.92 ± 0.67 82.19 ± 0.18 82.46 ± 0.24
WL-SP 56.76 ± 0.78 59.05 ± 1.05 74.49 ± 0.74 84.55 ± 0.36 83.53 ± 0.30
WL-OA-E (A) 59.72 ± 1.10 53.76 ± 0.82 — 84.75 ± 0.21 84.23 ± 0.19
Table 3: Comparison of classication accuracy against recent deep learning models on graphs.
Datasets PTC-MR PROTEINS NCI1 IMDB-B IMDB-M COLLAB
(WL-)RGE(ASG) 62.20 ± 1.67 76.63 ± 0.82 85.85 ± 0.42 71.48 ± 1.01 47.26 ± 0.89 76.85 ± 0.34
DGCNN 58.59 ± 2.47 75.54 ± 0.94 74.44 ± 0.47 70.03 ± 0.86 47.83 ± 0.85 73.76 ± 0.49
PSCN 62.30 ± 5.70 75.00 ± 2.51 76.34 ± 1.68 71.00 ± 2.29 45.23 ± 2.84 72.60 ± 2.15
DCNN 56.6 ± 1.20 61.29 ± 1.60 56.61 ± 1.04 49.06 ± 1.37 33.49 ± 1.42 52.11 ± 0.53
DGK 57.32 ± 1.13 71.68 ± 0.50 62.48 ±0.25 66.96 ± 0.56 44.55 ± 0.52 73.09 ± 0.25
aph in the range of n = [8 1024], respectively. When generating
ndom adjacency matrices, we set the number of edges always be
ice the number of nodes in a graph. We report the runtime for
mputing node embeddings using a state-of-the-art eigensolver
0], generating RGE graph embeddings, and the overall computa-
n of graph classication, accordingly. Fig. 3(a) shows the linear
alability of RGE when increasing the number of graphs, conrm-
g our complexity analysis in the previous Section. In addition, as
property of our RGE embeddings, which open the door to lar
scale applications of graph kernels for various applications such
social networks analysis and computational biology.
Comparison with All Baselines. Tables 1, 2, and 3 show th
RGE consistently outperforms or matches other state-of-the-
graph kernels and deep learning approaches in terms of clas
cation accuracy. There are several further observations wor
making here. First, EMD, the closest method to RGE, shows go
▶ Outperforms other graph kernels and deep learning approaches
▶ RGE is much faster than EMD
▶ WL-technique makes good performance
30 / 35
Conclusion
Proposed good graph kernel!
▶ Be scalable
▶ Take into account global property
thank you.
31 / 35
Appendix I
▶ グラフが同型ならば, 隣接行列の固有値は一致するが, 逆は成り立た
ない
Normalized Laplacian Matrix:
Li,j :=



1 if i = j and deg (vi ) ̸= 0
− 1√
deg(vi ) deg(vj )
if i ̸= j and vi is adjacent to vj
0 otherwise.
deg(v): Degree of node (vertex) v
32 / 35
Appendix II
33 / 35
Appendix III Table 4: Properties of the datasets.
Dataset MUTAG PTC ENZYMES PROTEINS NCI1 NCI109 IMDB-B IMDB-M COLLAB
Max # Nodes 28 109 126 620 111 111 136 89 492
Min # Nodes 10 2 2 4 3 4 12 7 32
Ave # Nodes 17.9 25.6 32.6 39.05 29.9 29.7 19.77 13.0 74.49
Max # Edges 33 108 149 1049 119 119 1249 1467 40119
Min # Edges 10 1 1 5 2 3 26 12 60
Ave # Edges 19.8 26.0 62.1 72.81 32.3 32.1 96.53 65.93 2457.34
# Graph 188 344 600 1113 4110 4127 1000 1500 5000
# Graph Labels 2 2 6 2 2 2 2 3 3
# Node Labels 7 19 3 3 37 38 — — —
wice the number of nodes in a graph. We use the size of
ding d = 6 just like in the previous sections. We set the
eters related to RGE itself are DMax = 10 and R = 128.
e runtime for computing node embeddings using state-
gensolver [33, 40] and RGE graph embeddings, and the
me, respectively.
ditional Results and Discussions on
mparisons Against All Baselines
e RGE is a graph embedding, we directly employ a lin-
plemented in LIBLIBNEAR [7] since it can faithfully
eectiveness of our feature representation from the
nonlinear learning solvers. Following the convention
experiments ten times (thus 100 runs per dataset) an
average prediction accuracies and standard deviations
of hyperparameters and D_max are [1e-3 1e-2 1e
[3:3:30], respectively. All parameters of the SVM and
eters of our method were optimized only on the train
The node embedding size is set to either 4, 6 or 8 bu
the same number for all variants of RGE on the same
eliminate the random eects, we repeat the whole exp
times and report the average prediction accuracies a
deviations. For all baselines we take the best number
the papers except EMD, where we rerun the experim
comparisons in terms of both accuracy and runtime. Sin
EMD, and PM are essentially built on the same node
Terms
WL test:
▶ Technique to improve kernel with node labels
RGE(ASG)-NodeLab:
▶ Data-dependent random graph + Incorporating Label information
WL-RGE:
▶ Data-dependent random graph + WL test
34 / 35
引用 I
Giannis Nikolentzos, Polykarpos Meladianos, and Michalis
Vazirgiannis.
Matching node embeddings for graph similarity.
In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
35 / 35
Ad

Recommended

parameterized complexity for graph Motif
parameterized complexity for graph Motif
AMR koura
 
Information-theoretic clustering with applications
Information-theoretic clustering with applications
Frank Nielsen
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNE
David Khosid
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
Kai-Wen Zhao
 
Triangle counting handout
Triangle counting handout
csedays
 
New Classes of Odd Graceful Graphs
New Classes of Odd Graceful Graphs
graphhoc
 
SIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithms
Jagadeeswaran Rathinavel
 
Lecture 11 (Digital Image Processing)
Lecture 11 (Digital Image Processing)
VARUN KUMAR
 
Lecture 3 image sampling and quantization
Lecture 3 image sampling and quantization
VARUN KUMAR
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...
Frank Nielsen
 
Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT poster
Alexander Litvinenko
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
201707 SER332 Lecture 23
201707 SER332 Lecture 23
Javier Gonzalez-Sanchez
 
Graph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & Trends
Luc Brun
 
QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017
Fred J. Hickernell
 
Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics
Alexander Litvinenko
 
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
The Statistical and Applied Mathematical Sciences Institute
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Frank Nielsen
 
Presentation 2(power point presentation) dis2016
Presentation 2(power point presentation) dis2016
Daniel Omunting
 
ikh323-05
ikh323-05
Anung Ariwibowo
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
Novel Performance Analysis of Network Coded Communications in Single-Relay Ne...
Novel Performance Analysis of Network Coded Communications in Single-Relay Ne...
Communication Systems & Networks
 
Efficient Technique for Image Stenography Based on coordinates of pixels
Efficient Technique for Image Stenography Based on coordinates of pixels
IOSR Journals
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representations
NAVER Engineering
 
Pixelrelationships
Pixelrelationships
Harshavardhan Reddy
 
Clustering lect
Clustering lect
Shadi Nabil Albarqouni
 
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
CDSL_at_SNU
 
A survey on graph kernels
A survey on graph kernels
vincyy
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
tuxette
 

More Related Content

What's hot (20)

Lecture 3 image sampling and quantization
Lecture 3 image sampling and quantization
VARUN KUMAR
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...
Frank Nielsen
 
Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT poster
Alexander Litvinenko
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
201707 SER332 Lecture 23
201707 SER332 Lecture 23
Javier Gonzalez-Sanchez
 
Graph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & Trends
Luc Brun
 
QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017
Fred J. Hickernell
 
Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics
Alexander Litvinenko
 
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
The Statistical and Applied Mathematical Sciences Institute
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Frank Nielsen
 
Presentation 2(power point presentation) dis2016
Presentation 2(power point presentation) dis2016
Daniel Omunting
 
ikh323-05
ikh323-05
Anung Ariwibowo
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
Novel Performance Analysis of Network Coded Communications in Single-Relay Ne...
Novel Performance Analysis of Network Coded Communications in Single-Relay Ne...
Communication Systems & Networks
 
Efficient Technique for Image Stenography Based on coordinates of pixels
Efficient Technique for Image Stenography Based on coordinates of pixels
IOSR Journals
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representations
NAVER Engineering
 
Pixelrelationships
Pixelrelationships
Harshavardhan Reddy
 
Clustering lect
Clustering lect
Shadi Nabil Albarqouni
 
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
CDSL_at_SNU
 
Lecture 3 image sampling and quantization
Lecture 3 image sampling and quantization
VARUN KUMAR
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...
Frank Nielsen
 
Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT poster
Alexander Litvinenko
 
Graph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & Trends
Luc Brun
 
QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017
Fred J. Hickernell
 
Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics
Alexander Litvinenko
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Frank Nielsen
 
Presentation 2(power point presentation) dis2016
Presentation 2(power point presentation) dis2016
Daniel Omunting
 
Novel Performance Analysis of Network Coded Communications in Single-Relay Ne...
Novel Performance Analysis of Network Coded Communications in Single-Relay Ne...
Communication Systems & Networks
 
Efficient Technique for Image Stenography Based on coordinates of pixels
Efficient Technique for Image Stenography Based on coordinates of pixels
IOSR Journals
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representations
NAVER Engineering
 
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
CDSL_at_SNU
 

Similar to Scalable Global Alignment Graph Kernel Using Random Features: From Node Embedding to Graph Embedding (20)

A survey on graph kernels
A survey on graph kernels
vincyy
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
tuxette
 
Representation Learning in Large Attributed Graphs
Representation Learning in Large Attributed Graphs
Nesreen K. Ahmed
 
Graph Machine Learning - Past, Present, and Future -
Graph Machine Learning - Past, Present, and Future -
kashipong
 
Graph Neural Networks for Recommendations
Graph Neural Networks for Recommendations
WQ Fan
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
Nesreen K. Ahmed
 
Workshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data Science
Neo4j
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
thanhdowork
 
An experimental evaluation of similarity-based and embedding-based link predi...
An experimental evaluation of similarity-based and embedding-based link predi...
IJDKP
 
An Experimental Evaluation of Similarity-Based and Embedding-Based Link Predi...
An Experimental Evaluation of Similarity-Based and Embedding-Based Link Predi...
IJDKP
 
kdd_talk.pdf
kdd_talk.pdf
ssuser6d9950
 
kdd_talk.pdf
kdd_talk.pdf
ssuser6d9950
 
19EC4073_PR_CO3 PPdcdfvsfgfvgfdgbtvfT.pptx
19EC4073_PR_CO3 PPdcdfvsfgfvgfdgbtvfT.pptx
lingaswamy16
 
Colloquium.pptx
Colloquium.pptx
Mythili680896
 
A Local Branching Heuristic For Solving A Graph Edit Distance Problem
A Local Branching Heuristic For Solving A Graph Edit Distance Problem
Robin Beregovska
 
Chapter2 NEAREST NEIGHBOURHOOD ALGORITHMS.pdf
Chapter2 NEAREST NEIGHBOURHOOD ALGORITHMS.pdf
PRABHUCECC
 
Grl book
Grl book
HibaRamadan4
 
3a-knn.pptxhggmtdu0lphm0kultkkkkkkkkkkkk
3a-knn.pptxhggmtdu0lphm0kultkkkkkkkkkkkk
Pluto62
 
Health-e-Child CaseReasoner
Health-e-Child CaseReasoner
GaborRendes
 
Presentation on Graph Clustering (vldb 09)
Presentation on Graph Clustering (vldb 09)
Waqas Nawaz
 
A survey on graph kernels
A survey on graph kernels
vincyy
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
tuxette
 
Representation Learning in Large Attributed Graphs
Representation Learning in Large Attributed Graphs
Nesreen K. Ahmed
 
Graph Machine Learning - Past, Present, and Future -
Graph Machine Learning - Past, Present, and Future -
kashipong
 
Graph Neural Networks for Recommendations
Graph Neural Networks for Recommendations
WQ Fan
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
Nesreen K. Ahmed
 
Workshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data Science
Neo4j
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
thanhdowork
 
An experimental evaluation of similarity-based and embedding-based link predi...
An experimental evaluation of similarity-based and embedding-based link predi...
IJDKP
 
An Experimental Evaluation of Similarity-Based and Embedding-Based Link Predi...
An Experimental Evaluation of Similarity-Based and Embedding-Based Link Predi...
IJDKP
 
19EC4073_PR_CO3 PPdcdfvsfgfvgfdgbtvfT.pptx
19EC4073_PR_CO3 PPdcdfvsfgfvgfdgbtvfT.pptx
lingaswamy16
 
A Local Branching Heuristic For Solving A Graph Edit Distance Problem
A Local Branching Heuristic For Solving A Graph Edit Distance Problem
Robin Beregovska
 
Chapter2 NEAREST NEIGHBOURHOOD ALGORITHMS.pdf
Chapter2 NEAREST NEIGHBOURHOOD ALGORITHMS.pdf
PRABHUCECC
 
3a-knn.pptxhggmtdu0lphm0kultkkkkkkkkkkkk
3a-knn.pptxhggmtdu0lphm0kultkkkkkkkkkkkk
Pluto62
 
Health-e-Child CaseReasoner
Health-e-Child CaseReasoner
GaborRendes
 
Presentation on Graph Clustering (vldb 09)
Presentation on Graph Clustering (vldb 09)
Waqas Nawaz
 
Ad

Recently uploaded (20)

PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
Cluster-Based Multi-Objective Metamorphic Test Case Pair Selection for Deep N...
Cluster-Based Multi-Objective Metamorphic Test Case Pair Selection for Deep N...
janeliewang985
 
AI VIDEO MAGAZINE - June 2025 - r/aivideo
AI VIDEO MAGAZINE - June 2025 - r/aivideo
1pcity Studios, Inc
 
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
UserCon Belgium: Honey, VMware increased my bill
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Nilesh Gule
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
Cluster-Based Multi-Objective Metamorphic Test Case Pair Selection for Deep N...
Cluster-Based Multi-Objective Metamorphic Test Case Pair Selection for Deep N...
janeliewang985
 
AI VIDEO MAGAZINE - June 2025 - r/aivideo
AI VIDEO MAGAZINE - June 2025 - r/aivideo
1pcity Studios, Inc
 
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
UserCon Belgium: Honey, VMware increased my bill
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Nilesh Gule
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
Ad

Scalable Global Alignment Graph Kernel Using Random Features: From Node Embedding to Graph Embedding

  • 1. Scalable Global Alignment Graph Kernel Using Random Features: From Node Embedding to Graph Embedding KDD2019 Lingfei Wu, Ian En-Hsu Yen, Zhen Zhang †, Kun Xu, Liang Zhao, Xi Peng, Yinglong Xia, Charu Aggarwal Presenter: Hagawa, Nishi, Eugene 2019.11.11 1 / 35
  • 2. Problem Setup Goal: ▶ Create a good kernel to measure Graph similarity ▶ Less computational complexity ▶ Take into account global and local graph property ▶ Have positive definite ▶ Leads to good classifier Application: ▶ Kernel SVM (input: graph, output: binary) ▶ Kernel PCA ▶ Kernel Ridge Regression ▶ . . . How similar? 𝑘( ) = 0.5, 2 / 35
  • 3. Difficulty : Graph isomorphism difficulty to define similarity between graphs ▶ 2 graphs : G1(V1, E1, ℓ1, L1), G2(V2, E2, ℓ2, L2) ▶ Bijection1 f exists, if and only if, G1 is isomorphism with G2 ▶ Bijection f : V1 → V2 s.t {va, vb} ∈ E1, va and vb are adjacent. ▶ Partial isomorphism is NP-complete 1 全単射 3 / 35
  • 4. Related Work 2 groups of recent graph kernel method Comparing sub-structure: ▶ The major difference is how to define and explore sub-structures - random walks, shortest paths, cycles, subtree patterns, graphlets... Geometric node embeddings: ▶ Capture global property ▶ Achieved state-of-the-art performance in the graph classification task Bad points of related works Comparing sub-structure: ▶ Do not take into account the global property Geometric node embeddings: ▶ Do not necessarily use positive definite for Kernel Poor scalability: 4 / 35
  • 5. Contribution ▶ Propose a Positive definite Kernel ▶ Reduce computational complexity ▶ From quadratic to (quasi-)linear 2 ▶ Propose an approximation of the kernel with convergence analysis ▶ Take into account global property ▶ Outperforms 12 state-of-the-art graph classification algorithms - Include graph kernels, deep graph neural networks 2 quasi-linear : n log n. Time and Space. 5 / 35
  • 6. Common kernel Compare directly 2 graphs using kernel Similarity 𝒌(・, ・) Figure: calculation of kernel value between 2 graphs 6 / 35
  • 7. Proposed kernel Compare directly 2 graphs using kernel Similarity 𝒌(・, ・) Random Graphs Similarity with 𝒌(・, ・) Figure: calculation of kernel value between 2 graphs 7 / 35
  • 8. Notation : Graph definition Graph: G = (V , E, ℓ) Node: V = {vi }n i=1 Edge: E = (V × V ) Assign label function: ℓ : V → Σ Size of node: n # of edge: m Node label: l # of graphs: N G<latexit sha1_base64="QLLEFqFGXJzmcwbhRTcNSo8/+r8=">AAAB6HicbVDLSsNAFJ3UV62vqks3g0VwVRItPnZFF7pswT6gDWUyvWnHTiZhZiKU0C9w40IRt36SO//GSRpErQcuHM65l3vv8SLOlLbtT6uwtLyyulZcL21sbm3vlHf32iqMJYUWDXkoux5RwJmAlmaaQzeSQAKPQ8ebXKd+5wGkYqG409MI3ICMBPMZJdpIzZtBuWJX7Qx4kTg5qaAcjUH5oz8MaRyA0JQTpXqOHWk3IVIzymFW6scKIkInZAQ9QwUJQLlJdugMHxlliP1QmhIaZ+rPiYQESk0Dz3QGRI/VXy8V//N6sfYv3ISJKNYg6HyRH3OsQ5x+jYdMAtV8agihkplbMR0TSag22ZSyEC5TnH2/vEjaJ1XntFpr1ir1qzyOIjpAh+gYOegc1dEtaqAWogjQI3pGL9a99WS9Wm/z1oKVz+yjX7DevwC1D40D</latexit> v1<latexit sha1_base64="6r48FeRijmeRwM0ce/9YOgxnVX0=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0+HErevFYwbSFNpTNdtMu3WzC7qZQQn+DFw+KePUHefPfuEmDqPXBwOO9GWbm+TFnStv2p1VaWV1b3yhvVra2d3b3qvsHbRUlklCXRDySXR8rypmgrmaa024sKQ59Tjv+5DbzO1MqFYvEg57F1AvxSLCAEayN5E4HqTMfVGt23c6BlolTkBoUaA2qH/1hRJKQCk04Vqrn2LH2Uiw1I5zOK/1E0RiTCR7RnqECh1R5aX7sHJ0YZYiCSJoSGuXqz4kUh0rNQt90hliP1V8vE//zeokOrryUiTjRVJDFoiDhSEco+xwNmaRE85khmEhmbkVkjCUm2uRTyUO4znDx/fIyaZ/VnfN6475Ra94UcZThCI7hFBy4hCbcQQtcIMDgEZ7hxRLWk/VqvS1aS1Yxcwi/YL1/AeZQjuI=</latexit> v2<latexit sha1_base64="HvFip7AjDkPR91+3+J6CugKM0SQ=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KkktftyKXjxWMG2hDWWz3bRLN5uwuymU0N/gxYMiXv1B3vw3btIgan0w8Hhvhpl5fsyZ0rb9aa2srq1vbJa2yts7u3v7lYPDtooSSahLIh7Jro8V5UxQVzPNaTeWFIc+px1/cpv5nSmVikXiQc9i6oV4JFjACNZGcqeDtD4fVKp2zc6BlolTkCoUaA0qH/1hRJKQCk04Vqrn2LH2Uiw1I5zOy/1E0RiTCR7RnqECh1R5aX7sHJ0aZYiCSJoSGuXqz4kUh0rNQt90hliP1V8vE//zeokOrryUiTjRVJDFoiDhSEco+xwNmaRE85khmEhmbkVkjCUm2uRTzkO4znDx/fIyaddrznmtcd+oNm+KOEpwDCdwBg5cQhPuoAUuEGDwCM/wYgnryXq13hatK1YxcwS/YL1/AefVjuM=</latexit> v3<latexit sha1_base64="+XpoULfOHqCHvyZwfk/DV8G7sg0=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0+HErevFYwbSFNpTNdtMu3WzC7qZQQn+DFw+KePUHefPfuEmDqPXBwOO9GWbm+TFnStv2p1VaWV1b3yhvVra2d3b3qvsHbRUlklCXRDySXR8rypmgrmaa024sKQ59Tjv+5DbzO1MqFYvEg57F1AvxSLCAEayN5E4H6fl8UK3ZdTsHWiZOQWpQoDWofvSHEUlCKjThWKmeY8faS7HUjHA6r/QTRWNMJnhEe4YKHFLlpfmxc3RilCEKImlKaJSrPydSHCo1C33TGWI9Vn+9TPzP6yU6uPJSJuJEU0EWi4KEIx2h7HM0ZJISzWeGYCKZuRWRMZaYaJNPJQ/hOsPF98vLpH1Wd87rjftGrXlTxFGGIziGU3DgEppwBy1wgQCDR3iGF0tYT9ar9bZoLVnFzCH8gvX+BelajuQ=</latexit> V = {v1, v2, v3}<latexit sha1_base64="5/LCIMtGZ5h5wVQCMmTMg/5dkCc=">AAAB/3icbVDLSsNAFJ34rPUVFdy4GSyCCylJW3wshKIblxXsA5oQJtNJO3QyCTOTQold+CtuXCji1t9w5984SYuo9cBcDufcy71z/JhRqSzr01hYXFpeWS2sFdc3Nre2zZ3dlowSgUkTRywSHR9JwignTUUVI51YEBT6jLT94XXmt0dESBrxOzWOiRuiPqcBxUhpyTP3W/ASOikcefaJLpWsVJ2JZ5asspUDzhN7RkpghoZnfji9CCch4QozJGXXtmLlpkgoihmZFJ1EkhjhIeqTrqYchUS6aX7/BB5ppQeDSOjHFczVnxMpCqUch77uDJEayL9eJv7ndRMVnLsp5XGiCMfTRUHCoIpgFgbsUUGwYmNNEBZU3wrxAAmElY6smIdwkeH0+8vzpFUp29Vy7bZWql/N4iiAA3AIjoENzkAd3IAGaAIM7sEjeAYvxoPxZLwab9PWBWM2swd+wXj/AprvlA8=</latexit> ⌃ = { , }<latexit sha1_base64="ZY89SR6jHBd25PoJ2nDrWsihEs4=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0VwISXR4mMhFN24rGgf0IQymU7aoTNJmJkINbT+ihsXirj1P9z5N07SIGo9cOFwzr3ce48XMSqVZX0aM7Nz8wuLhaXi8srq2rq5sdmQYSwwqeOQhaLlIUkYDUhdUcVIKxIEcY+Rpje4TP3mHRGShsGtGkbE5agXUJ9ipLTUMbedG9rjCJ5DJxmPxwe6nFHHLFllKwOcJnZOSiBHrWN+ON0Qx5wECjMkZdu2IuUmSCiKGRkVnViSCOEB6pG2pgHiRLpJdv0I7mmlC/1Q6AoUzNSfEwniUg65pzs5Un3510vF/7x2rPxTN6FBFCsS4MkiP2ZQhTCNAnapIFixoSYIC6pvhbiPBMJKB1bMQjhLcfz98jRpHJbto3LlulKqXuRxFMAO2AX7wAYnoAquQA3UAQb34BE8gxfjwXgyXo23SeuMkc9sgV8w3r8ASdWVRQ==</latexit> 8 / 35
  • 9. Notation Set of graphs: G = {Gi }N i=1 Set of graph lebels: Y = {Yi }N i=1 Set of geometric embeddings (each graph): U = {ui }n i=1 ∈ Rn×d Latent node embedding space (each node): u ∈ Rd 𝐺" ・・・ 𝑁 𝑛Latent node ↑ embedding Node size→ # of graphs→ 𝑌"Graph label→ 𝑌& 𝐺& u1 2 Rd <latexit sha1_base64="FiX+xGGr4lrH54q+qBxUWlkIUrA=">AAACDnicbVDLSsNAFJ3UV62vqEs3g6XgqiRafOyKblxWsQ9oYphMJu3QySTMTIQS8gVu/BU3LhRx69qdf2OSBlHrgQuHc+7l3nvciFGpDONTqywsLi2vVFdra+sbm1v69k5PhrHApItDFoqBiyRhlJOuooqRQSQIClxG+u7kIvf7d0RIGvIbNY2IHaARpz7FSGWSozcsN2RegNQ4iVMnMVNoUQ6tXBBBcp3eJl4KoaPXjaZRAM4TsyR1UKLj6B+WF+I4IFxhhqQcmkak7AQJRTEjac2KJYkQnqARGWaUo4BIOyneSWEjUzzohyIrrmCh/pxIUCDlNHCzzvxO+dfLxf+8Yaz8UzuhPIoV4Xi2yI8ZVCHMs4EeFQQrNs0IwoJmt0I8RgJhlSVYK0I4y3H8/fI86R02zaNm66pVb5+XcVTBHtgHB8AEJ6ANLkEHdAEG9+ARPIMX7UF70l61t1lrRStndsEvaO9fjuucjQ==</latexit> u2<latexit sha1_base64="Pm48/PPv93nEVMYQDi7yld7eDYw=">AAAB+XicbVDLSsNAFJ34rPUVdelmsAiuSlKLj13RjcsK9gFtCJPJpB06mQkzk0IJ/RM3LhRx65+482+cpEHUemDgcM693DMnSBhV2nE+rZXVtfWNzcpWdXtnd2/fPjjsKpFKTDpYMCH7AVKEUU46mmpG+okkKA4Y6QWT29zvTYlUVPAHPUuIF6MRpxHFSBvJt+1hIFgYIz3O0rmfNea+XXPqTgG4TNyS1ECJtm9/DEOB05hwjRlSauA6ifYyJDXFjMyrw1SRBOEJGpGBoRzFRHlZkXwOT40SwkhI87iGhfpzI0OxUrM4MJN5RvXXy8X/vEGqoysvozxJNeF4cShKGdQC5jXAkEqCNZsZgrCkJivEYyQR1qasalHCdY6L7y8vk26j7p7Xm/fNWuumrKMCjsEJOAMuuAQtcAfaoAMwmIJH8AxerMx6sl6tt8XoilXuHIFfsN6/ACNylCA=</latexit> u3<latexit sha1_base64="w2BS8kqWqIp26xG7B4vB81cBpaY=">AAAB+XicbVDLSsNAFJ3UV62vqEs3g0VwVRItPnZFNy4r2Ae0IUwmk3boZCbMTAol9E/cuFDErX/izr9xkgZR64GBwzn3cs+cIGFUacf5tCorq2vrG9XN2tb2zu6evX/QVSKVmHSwYEL2A6QIo5x0NNWM9BNJUBww0gsmt7nfmxKpqOAPepYQL0YjTiOKkTaSb9vDQLAwRnqcpXM/O5/7dt1pOAXgMnFLUgcl2r79MQwFTmPCNWZIqYHrJNrLkNQUMzKvDVNFEoQnaEQGhnIUE+VlRfI5PDFKCCMhzeMaFurPjQzFSs3iwEzmGdVfLxf/8wapjq68jPIk1YTjxaEoZVALmNcAQyoJ1mxmCMKSmqwQj5FEWJuyakUJ1zkuvr+8TLpnDfe80bxv1ls3ZR1VcASOwSlwwSVogTvQBh2AwRQ8gmfwYmXWk/VqvS1GK1a5cwh+wXr/AiT3lCE=</latexit> 9 / 35
  • 10. Geometric Embeddings Use partial eigendecomposition 3 to extract node embeddings: 1. Create normalized Laplacian matrix L ∈ Rn×n 2. Do partial eigendecomposition and obtaining U 3. Use the smallest d eigenvectors Normalized Laplacian matrix → Partial Eigendecomposition 𝑈Λ𝑈# 𝑛×𝑑 𝑑×𝑑 𝑑×𝑛 The smallest 𝑑 eigenvectors L𝑛×𝑛 A B C A B C A 0 1 1 B 1 0 0 C 1 0 0 Adjacency matrix A B C A 2 0 0 B 0 1 0 C 0 0 1 Degree matrix A B C A 2 -1 -1 B -1 1 0 C -1 0 1 Laplacian matrix -= Normalize Figure: Example obtaining U 3 Time complexity: Linear (# of graph edge) (...I don’t know how.) 10 / 35
  • 11. Transportation Distance [1] Earth Mover’s Distance (EMD): measure of dissimilarity EMD (Gx , Gy ) := min T ∈R nx ×ny + ⟨D, T ⟩ s.t.T 1 = t(Gx ) , T T 1 = t(Gy ) ▶ Linear programming problem ▶ Flow matrix T - Tij : how much of vi in Gx travels to vj in Gy ▶ GX → UX = {ux 1, ux 2, · · · , ux nx } ▶ GY → UY = {uy 1, uy 2, · · · , uy ny } ▶ Transport cost matrix D - Dij = ∥ux i − uy j ∥2 11 / 35
  • 12. Transportation Distance [1] Earth Mover’s Distance (EMD): measure of dissimilarity EMD (Gx , Gy ) := min T ∈R nx ×ny + ⟨D, T ⟩ s.t.T 1 = t(Gx ) , T T 1 = t(Gy ) ▶ Node vi has ci outgoing edges ▶ Normalized bog-of-words (nBOW): ti = ci / ∑n j=1 cj ∈ R 12 / 35
  • 13. Transportation Distance: Example AB C ab c A B C a b c a b c A B C A B C a b c Figure: EMD example ▶ EMD focus on node size and outgoing edges of each graph 13 / 35
  • 14. Straightforward way to define kernel, It’s high cost EDM based Kernel = − 1 2 JDemd J J = I − 1 N 11⊤ ▶ Not necessarily positive definite ▶ Time complexity:O(N2n3log(n)), Space complexity :O(N2) Graph A Graph CGraph B A B C A EMD(A,A) EMD(A,B) EMD(A,C) B EMD(B,A) EMD(B,B) EMD(B,C) C EMD(C,A) EMD(C,B) EMD(C,C) Distance Matrix Demd<latexit sha1_base64="/2TlWLrs+SXNjZgq6EbvOI79l2o=">AAAB7nicbVDLSsNAFJ3UV62vqks3g0VwVRItPnZFXbisYB/QhjKZ3LRDJ5MwMxFK6Ee4caGIW7/HnX/jJA2i1gMXDufcy733eDFnStv2p1VaWl5ZXSuvVzY2t7Z3qrt7HRUlkkKbRjySPY8o4ExAWzPNoRdLIKHHoetNrjO/+wBSsUjc62kMbkhGggWMEm2k7s0whdCfDas1u27nwIvEKUgNFWgNqx8DP6JJCEJTTpTqO3as3ZRIzSiHWWWQKIgJnZAR9A0VJATlpvm5M3xkFB8HkTQlNM7VnxMpCZWahp7pDIkeq79eJv7n9RMdXLgpE3GiQdD5oiDhWEc4+x37TALVfGoIoZKZWzEdE0moNglV8hAuM5x9v7xIOid157TeuGvUmldFHGV0gA7RMXLQOWqiW9RCbUTRBD2iZ/RixdaT9Wq9zVtLVjGzj37Bev8Cc8SPyQ==</latexit> Figure: Straightforward kernel based on EMD 14 / 35
  • 15. Global Alignment Graph Kernel Using EMD and Random feature (RF) Proposed Kernel: 4 k (Gx , Gy ) := ∫ p (Gω) ϕGω (Gx ) dGω where ϕGω := exp (−γEMD(Gx , Gω)) ▶ Gω : random graph ▶ W = {wi }D i=1 ▶ wi is sampled from V ∈ Rd ▶ p(Gω) is a distribution over the space of all random graphs of variable sizes Ω := ∪Dmax D=1VD 4 ランダムグラフの詳細に踏み込もうと思ったが, 非常に込み入った話でハガワは挫折 した. 気になる方はこちらを参照. 確率なんもワカンネ. 15 / 35
  • 16. Global Alignment Graph Kernel Using EMD and RF Approximation5: ˜k (Gx , Gy ) = 1 R R∑ i=1 ϕGωi (Gx ) ϕGωi (Gy ) → k (Gx , Gy ) , as R → ∞ 𝜙"# (𝐺&) Random Graphs 𝐺( 𝐺& 𝜙"# (𝐺)) 𝐺) 5 the uniform convergence of approximate proposed kernel 16 / 35
  • 17. Algorithm Set Data and hyperparameters ▶ Node embedding size (dimension): d ▶ Max size of random graphs: Dmax ▶ Graph embedding size: R 𝜙"# (𝐺&) Random Graphs 𝐺( 𝐺& 𝜙"# (𝐺)) 𝐺) DataGraphs 𝑅𝐷,-& 𝑑 Algorithm 1 Random Graph Embedding Input: Data graphs {Gi }N i=1, node embedding size d, maximum size of random graphs Dmax , graph embedding size R. Output: Feature matrix ZN ⇥R for data graphs 1: Compute nBOW weights vectors {t(Gi )}N i=1 of the normalized Laplacian L of all graphs 2: Obtain node embedding vectors {ui }n i=1 by computing d small- est eigenvectors of L 3: for j = 1, . . . ,R do 4: Draw Dj uniformly from [1, Dmax ]. 5: Generate a random graph G j with Dj number of nodes embeddings W from Algorithm 2. 6: Compute a feature vector Zj = G j ({Gi }N i=1)) using EMD or other optimal transportation distance in Equation (3). 7: end for 8: Return feature matrix Z({Gi }N i=1) = 1p R {Zi }R i=1 17 / 35
  • 18. Compute {t(Gt )}N i=1 and Laplacian matrix L A B C A B C A 2 -1 -1 B -1 1 0 C -1 0 1 Laplacian matrix 𝒕(𝑮 𝒙) ½ ¼ ¼ →For All Graphs Algorithm 1 Random Graph Embedding Input: Data graphs {Gi }N i=1, node embedding size d, maximum size of random graphs Dmax , graph embedding size R. Output: Feature matrix ZN ⇥R for data graphs 1: Compute nBOW weights vectors {t(Gi )}N i=1 of the normalized Laplacian L of all graphs 2: Obtain node embedding vectors {ui }n i=1 by computing d small- est eigenvectors of L 3: for j = 1, . . . ,R do 4: Draw Dj uniformly from [1, Dmax ]. 5: Generate a random graph G j with Dj number of nodes embeddings W from Algorithm 2. 6: Compute a feature vector Zj = G j ({Gi }N i=1)) using EMD or other optimal transportation distance in Equation (3). 7: end for 8: Return feature matrix Z({Gi }N i=1) = 1p R {Zi }R i=1 18 / 35
  • 19. Obtain node embedding vectors Normalized Laplacian matrix → Partial Eigendecomposition 𝑈Λ𝑈# 𝑛×𝑑 𝑑×𝑑 𝑑×𝑛 The smallest 𝑑 eigenvectors L𝑛×𝑛 →For All Graphs 𝐺" u1 2 Rd <latexit sha1_base64="FiX+xGGr4lrH54q+qBxUWlkIUrA=">AAACDnicbVDLSsNAFJ3UV62vqEs3g6XgqiRafOyKblxWsQ9oYphMJu3QySTMTIQS8gVu/BU3LhRx69qdf2OSBlHrgQuHc+7l3nvciFGpDONTqywsLi2vVFdra+sbm1v69k5PhrHApItDFoqBiyRhlJOuooqRQSQIClxG+u7kIvf7d0RIGvIbNY2IHaARpz7FSGWSozcsN2RegNQ4iVMnMVNoUQ6tXBBBcp3eJl4KoaPXjaZRAM4TsyR1UKLj6B+WF+I4IFxhhqQcmkak7AQJRTEjac2KJYkQnqARGWaUo4BIOyneSWEjUzzohyIrrmCh/pxIUCDlNHCzzvxO+dfLxf+8Yaz8UzuhPIoV4Xi2yI8ZVCHMs4EeFQQrNs0IwoJmt0I8RgJhlSVYK0I4y3H8/fI86R02zaNm66pVb5+XcVTBHtgHB8AEJ6ANLkEHdAEG9+ARPIMX7UF70l61t1lrRStndsEvaO9fjuucjQ==</latexit> u2<latexit sha1_base64="Pm48/PPv93nEVMYQDi7yld7eDYw=">AAAB+XicbVDLSsNAFJ34rPUVdelmsAiuSlKLj13RjcsK9gFtCJPJpB06mQkzk0IJ/RM3LhRx65+482+cpEHUemDgcM693DMnSBhV2nE+rZXVtfWNzcpWdXtnd2/fPjjsKpFKTDpYMCH7AVKEUU46mmpG+okkKA4Y6QWT29zvTYlUVPAHPUuIF6MRpxHFSBvJt+1hIFgYIz3O0rmfNea+XXPqTgG4TNyS1ECJtm9/DEOB05hwjRlSauA6ifYyJDXFjMyrw1SRBOEJGpGBoRzFRHlZkXwOT40SwkhI87iGhfpzI0OxUrM4MJN5RvXXy8X/vEGqoysvozxJNeF4cShKGdQC5jXAkEqCNZsZgrCkJivEYyQR1qasalHCdY6L7y8vk26j7p7Xm/fNWuumrKMCjsEJOAMuuAQtcAfaoAMwmIJH8AxerMx6sl6tt8XoilXuHIFfsN6/ACNylCA=</latexit> u3<latexit sha1_base64="w2BS8kqWqIp26xG7B4vB81cBpaY=">AAAB+XicbVDLSsNAFJ3UV62vqEs3g0VwVRItPnZFNy4r2Ae0IUwmk3boZCbMTAol9E/cuFDErX/izr9xkgZR64GBwzn3cs+cIGFUacf5tCorq2vrG9XN2tb2zu6evX/QVSKVmHSwYEL2A6QIo5x0NNWM9BNJUBww0gsmt7nfmxKpqOAPepYQL0YjTiOKkTaSb9vDQLAwRnqcpXM/O5/7dt1pOAXgMnFLUgcl2r79MQwFTmPCNWZIqYHrJNrLkNQUMzKvDVNFEoQnaEQGhnIUE+VlRfI5PDFKCCMhzeMaFurPjQzFSs3iwEzmGdVfLxf/8wapjq68jPIk1YTjxaEoZVALmNcAQyoJ1mxmCMKSmqwQj5FEWJuyakUJ1zkuvr+8TLpnDfe80bxv1ls3ZR1VcASOwSlwwSVogTvQBh2AwRQ8gmfwYmXWk/VqvS1GK1a5cwh+wXr/AiT3lCE=</latexit> Algorithm 1 Random Graph Embedding Input: Data graphs {Gi }N i=1, node embedding size d, maximum size of random graphs Dmax , graph embedding size R. Output: Feature matrix ZN ⇥R for data graphs 1: Compute nBOW weights vectors {t(Gi )}N i=1 of the normalized Laplacian L of all graphs 2: Obtain node embedding vectors {ui }n i=1 by computing d small- est eigenvectors of L 3: for j = 1, . . . ,R do 4: Draw Dj uniformly from [1, Dmax ]. 5: Generate a random graph G j with Dj number of nodes embeddings W from Algorithm 2. 6: Compute a feature vector Zj = G j ({Gi }N i=1)) using EMD or other optimal transportation distance in Equation (3). 7: end for 8: Return feature matrix Z({Gi }N i=1) = 1p R {Zi }R i=1 19 / 35
  • 20. Generate random graph 6 2 ← 𝑅𝑎𝑛𝑑(1, 𝐷,-.) 𝑈2×𝑑 ← Generate_𝑟𝑎𝑛𝑑𝑜𝑚_𝑔𝑟𝑎𝑝ℎ (2, 𝑑) u1 2 Rd <latexit sha1_base64="FiX+xGGr4lrH54q+qBxUWlkIUrA=">AAACDnicbVDLSsNAFJ3UV62vqEs3g6XgqiRafOyKblxWsQ9oYphMJu3QySTMTIQS8gVu/BU3LhRx69qdf2OSBlHrgQuHc+7l3nvciFGpDONTqywsLi2vVFdra+sbm1v69k5PhrHApItDFoqBiyRhlJOuooqRQSQIClxG+u7kIvf7d0RIGvIbNY2IHaARpz7FSGWSozcsN2RegNQ4iVMnMVNoUQ6tXBBBcp3eJl4KoaPXjaZRAM4TsyR1UKLj6B+WF+I4IFxhhqQcmkak7AQJRTEjac2KJYkQnqARGWaUo4BIOyneSWEjUzzohyIrrmCh/pxIUCDlNHCzzvxO+dfLxf+8Yaz8UzuhPIoV4Xi2yI8ZVCHMs4EeFQQrNs0IwoJmt0I8RgJhlSVYK0I4y3H8/fI86R02zaNm66pVb5+XcVTBHtgHB8AEJ6ANLkEHdAEG9+ARPIMX7UF70l61t1lrRStndsEvaO9fjuucjQ==</latexit> u2<latexit sha1_base64="Pm48/PPv93nEVMYQDi7yld7eDYw=">AAAB+XicbVDLSsNAFJ34rPUVdelmsAiuSlKLj13RjcsK9gFtCJPJpB06mQkzk0IJ/RM3LhRx65+482+cpEHUemDgcM693DMnSBhV2nE+rZXVtfWNzcpWdXtnd2/fPjjsKpFKTDpYMCH7AVKEUU46mmpG+okkKA4Y6QWT29zvTYlUVPAHPUuIF6MRpxHFSBvJt+1hIFgYIz3O0rmfNea+XXPqTgG4TNyS1ECJtm9/DEOB05hwjRlSauA6ifYyJDXFjMyrw1SRBOEJGpGBoRzFRHlZkXwOT40SwkhI87iGhfpzI0OxUrM4MJN5RvXXy8X/vEGqoysvozxJNeF4cShKGdQC5jXAkEqCNZsZgrCkJivEYyQR1qasalHCdY6L7y8vk26j7p7Xm/fNWuumrKMCjsEJOAMuuAQtcAfaoAMwmIJH8AxerMx6sl6tt8XoilXuHIFfsN6/ACNylCA=</latexit> Figure: 2 nodes random graph example Algorithm 1 Random Graph Embedding Input: Data graphs {Gi }N i=1, node embedding size d, maximum size of random graphs Dmax , graph embedding size R. Output: Feature matrix ZN ⇥R for data graphs 1: Compute nBOW weights vectors {t(Gi )}N i=1 of the normalized Laplacian L of all graphs 2: Obtain node embedding vectors {ui }n i=1 by computing d small- est eigenvectors of L 3: for j = 1, . . . ,R do 4: Draw Dj uniformly from [1, Dmax ]. 5: Generate a random graph G j with Dj number of nodes embeddings W from Algorithm 2. 6: Compute a feature vector Zj = G j ({Gi }N i=1)) using EMD or other optimal transportation distance in Equation (3). 7: end for 8: Return feature matrix Z({Gi }N i=1) = 1p R {Zi }R i=1 6 In after section, I show 2 way to generate random graphs. 20 / 35
  • 21. Compute a feature veotor Zj 𝜙"# (𝐺&) 𝑧)= 𝑍)+ ⋮ ⋮ ⋮ 𝑍)- Zji = ϕGω (Gi ) := exp (−γ EMD (Gi , Gω)) Algorithm 1 Random Graph Embedding Input: Data graphs {Gi }N i=1, node embedding size d, maximum size of random graphs Dmax , graph embedding size R. Output: Feature matrix ZN ⇥R for data graphs 1: Compute nBOW weights vectors {t(Gi )}N i=1 of the normalized Laplacian L of all graphs 2: Obtain node embedding vectors {ui }n i=1 by computing d small- est eigenvectors of L 3: for j = 1, . . . ,R do 4: Draw Dj uniformly from [1, Dmax ]. 5: Generate a random graph G j with Dj number of nodes embeddings W from Algorithm 2. 6: Compute a feature vector Zj = G j ({Gi }N i=1)) using EMD or other optimal transportation distance in Equation (3). 7: end for 8: Return feature matrix Z({Gi }N i=1) = 1p R {Zi }R i=1 21 / 35
  • 22. Generate random graph for R times 𝑧"= 𝑍"" ⋮ ⋮ ⋮ 𝑍"% 𝑧&= 𝑍&" ⋮ ⋮ ⋮ 𝑍&% 𝜙() (𝐺,) 𝜙() (𝐺,) ⋯ 𝜙() (𝐺,) 𝑧/= 𝑍/" ⋮ ⋮ ⋮ 𝑍/% ⋯ Algorithm 1 Random Graph Embedding Input: Data graphs {Gi }N i=1, node embedding size d, maximum size of random graphs Dmax , graph embedding size R. Output: Feature matrix ZN ⇥R for data graphs 1: Compute nBOW weights vectors {t(Gi )}N i=1 of the normalized Laplacian L of all graphs 2: Obtain node embedding vectors {ui }n i=1 by computing d small- est eigenvectors of L 3: for j = 1, . . . ,R do 4: Draw Dj uniformly from [1, Dmax ]. 5: Generate a random graph G j with Dj number of nodes embeddings W from Algorithm 2. 6: Compute a feature vector Zj = G j ({Gi }N i=1)) using EMD or other optimal transportation distance in Equation (3). 7: end for 8: Return feature matrix Z({Gi }N i=1) = 1p R {Zi }R i=1 7 7 R : number of Random graphs 22 / 35
  • 23. Output N × R Matrix Z 𝑍= " √$ 𝑍"" ⋮ 𝑍"& ⋮ 𝑍"' ⋯ ⋯ 𝑍$" ⋮ 𝑍$& ⋮ 𝑍$' 𝑍"& 𝑍*& 𝑍*' ⋯ ⋯ Algorithm 1 Random Graph Embedding Input: Data graphs {Gi }N i=1, node embedding size d, maximum size of random graphs Dmax , graph embedding size R. Output: Feature matrix ZN ⇥R for data graphs 1: Compute nBOW weights vectors {t(Gi )}N i=1 of the normalized Laplacian L of all graphs 2: Obtain node embedding vectors {ui }n i=1 by computing d small- est eigenvectors of L 3: for j = 1, . . . ,R do 4: Draw Dj uniformly from [1, Dmax ]. 5: Generate a random graph G j with Dj number of nodes embeddings W from Algorithm 2. 6: Compute a feature vector Zj = G j ({Gi }N i=1)) using EMD or other optimal transportation distance in Equation (3). 7: end for 8: Return feature matrix Z({Gi }N i=1) = 1p R {Zi }R i=1 23 / 35
  • 24. How to generate Random Graph Data-independent and Data-dependent Distributions Data-dependent 8 Random Graph Embedding(Anchor Sug-Graphs(ASG)): 1. Pick up Gk from data set 2. Uniformly draw Dj nodes 3. {wi } Dj i=1 = {un1 , un1 , · · · , unDj } Incorporating Label information: ▶ d(ui , uj) = max(∥ui − uj∥2, √ d) if vi and vj have diffrent node label ▶ Make distance between different node labels ▶ √ d is largest distance in a d-dimentionnal unit hypercube space 8 data independent は appendix を参照 24 / 35
  • 25. Complexity comparison (Left: Proposed, Right: Straightforward) 𝜙"# (𝐺&) Random Graphs 𝐺( 𝐺& 𝜙"# (𝐺)) 𝐺) Figure: Proposed kernel Graph A Graph CGraph B A B C A EMD(A,A) EMD(A,B) EMD(A,C) B EMD(B,A) EMD(B,B) EMD(B,C) C EMD(C,A) EMD(C,B) EMD(C,C) Distance Matrix Figure: Straitforward kernel Time complexity (dmz is partial eigendecomposition cost) 9: ▶ O(NRD2nlog(n) + dmz) ▶ O(N2n3log(n) + dmz) ※ R is # of Random Graphs, D is # of Random Graph nodes (D < n) Space complexity: ▶ O(NR) ▶ O(N2) 9 dmz is eigendecomposition cost. 25 / 35
  • 26. Experiments Experimental setup Machine: ▶ Use linear SVM (LIBLINEAR) Data: ▶ 9 Datasets Hyperparameters: ▶ γ(Kernel)→[1e-3 1e-2 1e-1 1 10] ▶ D max (Size of random graph)→[3:3:30] ▶ SVM Evaluation: ▶ 10-fold cross-validation ▶ 10 times average accuracy 26 / 35
  • 27. # of Random Graph (R) and Testing accuracy: 10 0 10 1 10 2 10 3 10 4 Varying R 15 20 25 30 35 40 45 50 TestingAccuracy% Testing Accuracy VS R RGE(RF) RGE(ASG) RGE(ASG)-NodeLab (a) ENZYMES 10 0 10 1 10 2 10 3 10 4 Varying R 62 64 66 68 70 72 74 76 TestingAccuracy% Testing Accuracy VS R RGE(RF) RGE(ASG) RGE(ASG)-NodeLab (b) NCI109 10 0 10 1 10 2 10 3 10 4 Varying R 55 60 65 70 75 TestingAccuracy% Testing Accuracy VS R RGE(RF) RGE(ASG) (c) IMDBBINARY 10 0 10 1 10 2 10 3 10 4 Varying R 55 60 65 70 75 80 TestingAccuracy% Testing Accuracy VS R RGE(RF) RGE(ASG) (d) COLLAB 0 500 1000 1500 2000 2500 Varying R 0 10 20 30 40 Runtime(Seconds) Total Runtime VS R RGE(RF) RGE(ASG) RGE(ASG)-NodeLab (e) ENZYMES 0 1000 2000 3000 4000 5000 Varying R 0 100 200 300 400 500 Runtime(Seconds) Total Runtime VS R RGE(RF) RGE(ASG) RGE(ASG)-NodeLab (f) NCI109 0 1000 2000 3000 4000 5000 Varying R 0 20 40 60 80 100 120 140 Runtime(Seconds) Total Runtime VS R RGE(RF) RGE(ASG) (g) IMDBBINARY 0 1000 2000 3000 4000 5000 Varying R 0 500 1000 1500 2000 Runtime(Seconds) Total Runtime VS R RGE(RF) RGE(ASG) (h) COLLAB Figure 2: Test accuracies and runtime of three variants of RGE with and without node labels when varying R. ▶ Converge very rapidly when increasing R # of Random Graph (R) and Runtime: 10 0 10 1 10 2 10 3 10 4 Varying R 15 20 25 30 35 40 45 50 TestingAccuracy% Testing Accuracy VS R RGE(RF) RGE(ASG) RGE(ASG)-NodeLab (a) ENZYMES 10 0 10 1 10 2 10 3 10 4 Varying R 62 64 66 68 70 72 74 76 TestingAccuracy% Testing Accuracy VS R RGE(RF) RGE(ASG) RGE(ASG)-NodeLab (b) NCI109 10 0 10 1 10 2 10 3 10 4 Varying R 55 60 65 70 75 TestingAccuracy% Testing Accuracy VS R RGE(RF) RGE(ASG) (c) IMDBBINARY 10 0 10 1 10 2 10 3 10 4 Varying R 55 60 65 70 75 80 TestingAccuracy% Testing Accuracy VS R RGE(RF) RGE(ASG) (d) COLLAB 0 500 1000 1500 2000 2500 Varying R 0 10 20 30 40 Runtime(Seconds) Total Runtime VS R RGE(RF) RGE(ASG) RGE(ASG)-NodeLab (e) ENZYMES 0 1000 2000 3000 4000 5000 Varying R 0 100 200 300 400 500 Runtime(Seconds) Total Runtime VS R RGE(RF) RGE(ASG) RGE(ASG)-NodeLab (f) NCI109 0 1000 2000 3000 4000 5000 Varying R 0 20 40 60 80 100 120 140 Runtime(Seconds) Total Runtime VS R RGE(RF) RGE(ASG) (g) IMDBBINARY 0 1000 2000 3000 4000 5000 Varying R 0 500 1000 1500 2000 Runtime(Seconds) Total Runtime VS R RGE(RF) RGE(ASG) (h) COLLAB Figure 2: Test accuracies and runtime of three variants of RGE with and without node labels when varying R. ▶ Show quasi-linear scalability with respect to R 27 / 35
  • 28. 10 2 10 3 10 4 Varying number of graphs N 10 -2 10 0 10 2 10 4 10 6 10 8 Time(Seconds) Runtime VS number of graphs N RGE(Eigentime) RGE(FeaGentime) RGE(Runtime) Linear Quatratic (a) Number of graphs N 10 2 10 3 Varying size of graph n 10 0 10 1 10 2 10 3 10 4 10 5 Time(Seconds) Runtime VS size of graph n RGE(Eigentime) RGE(FeaGentime) RGE(Runtime) Linear Quatratic (b) Size of graph n ▶ shows the linear scalability with respect to N (a) ▶ shows the quasi-liniear scalability with respect to n (b) 28 / 35
  • 29. classification accuracy: Table 1: Comparison of classication accuracy against graph kernel methods without node labels. Datasets MUTAG PTC-MR ENZYMES NCI1 NCI019 RGE(RF) 86.33 ± 1.39(1s) 59.82 ± 1.42(1s) 35.98 ± 0.89(38s) 74.70 ± 0.56(727s) 72.50 ± 0.32(865s) RGE(ASG) 85.56 ± 0.91(2s) 59.97 ± 1.65 (1s) 38.52 ± 0.91(18s) 74.30 ± 0.45(579s) 72.70 ± 0.42(572s) EMD 84.66 ± 2.69 (7s) 57.65 ± 0.59 (46s) 35.45 ± 0.93 (216s) 72.65 ± 0.34 (8359s) 70.84 ± 0.18 (8281s) PM 83.83 ± 2.86 59.41 ± 0.68 28.17 ± 0.37 69.73 ± 0.11 68.37 ± 0.14 Lo- 82.58 ± 0.79 55.21 ± 0.72 26.5 ± 0.54 62.28 ± 0.34 62.52 ± 0.29 OA-E (A) 79.89 ± 0.98 56.77 ± 0.85 36.12 ± 0.81 67.99 ± 0.28 67.14 ± 0.26 RW 77.78 ± 0.98 56.18 ± 1.12 20.17 ± 0.83 56.89 ± 0.34 56.13 ± 0.31 GL 66.11 ± 1.31 57.05 ± 0.83 18.16 ± 0.47 47.37 ± 0.15 48.39 ± 0.18 SP 82.22 ± 1.14 56.18 ± 0.56 28.17 ± 0.64 62.02 ± 0.17 61.41 ± 0.32 Table 2: Comparison of classication accuracy against graph kernel methods with node labels or WL technique. Datasets PTC-MR ENZYMES PROTEINS NCI1 NCI019 RGE(ASG) 61.5 ± 2.34(1s) 48.27 ± 0.99(28s) 75.98 ± 0.71(20s) 76.46 ± 0.45(379s) 74.42 ± 0.30(526s) EMD 57.67 ± 2.11 (42s) 42.85 ± 0.72 (296s) 76.03 ± 0.28 (1936s) 75.89 ± 0.16 (7942s) 73.63 ± 0.33 (8073s) PM 60.38 ± 0.86 40.33 ± 0.34 74.39 ± 0.45 72.91 ± 0.53 71.97 ± 0.15 OA-E (A) 58.76 ± 0.92 43.56 ± 0.66 — 69.83 ± 0.30 68.96 ± 0.35 V-OA 56.4 ± 1.8 35.1 ± 1.1 73.8 ± 0.5 65.6 ± 0.4 65.1 ± 0.4 RW 57.06 ± 0.86 19.33 ± 0.62 71.67 ± 0.78 63.34 ± 0.27 63.51 ± 0.18 GL 59.41 ± 0.94 32.70 ± 1.20 71.63 ± 0.33 66.00 ± 0.07 66.59 ± 0.08 SP 60.00 ± 0.72 41.68 ± 1.79 73.32 ± 0.45 73.47 ± 0.11 73.07 ± 0.11 WL-RGE(ASG) 62.20 ± 1.67(1s) 57.97 ± 1.16(38s) 76.63 ± 0.82(30s) 85.85 ± 0.42(401s) 85.32 ± 0.29(798s) WL-ST 57.64 ± 0.68 52.22 ± 0.71 72.92 ± 0.67 82.19 ± 0.18 82.46 ± 0.24 ▶ RGE is much faster than EMD 29 / 35
  • 30. Table 2: Comparison of classication accuracy against graph kernel methods with node labels or WL technique. Datasets PTC-MR ENZYMES PROTEINS NCI1 NCI019 RGE(ASG) 61.5 ± 2.34(1s) 48.27 ± 0.99(28s) 75.98 ± 0.71(20s) 76.46 ± 0.45(379s) 74.42 ± 0.30(526s) EMD 57.67 ± 2.11 (42s) 42.85 ± 0.72 (296s) 76.03 ± 0.28 (1936s) 75.89 ± 0.16 (7942s) 73.63 ± 0.33 (8073s) PM 60.38 ± 0.86 40.33 ± 0.34 74.39 ± 0.45 72.91 ± 0.53 71.97 ± 0.15 OA-E (A) 58.76 ± 0.92 43.56 ± 0.66 — 69.83 ± 0.30 68.96 ± 0.35 V-OA 56.4 ± 1.8 35.1 ± 1.1 73.8 ± 0.5 65.6 ± 0.4 65.1 ± 0.4 RW 57.06 ± 0.86 19.33 ± 0.62 71.67 ± 0.78 63.34 ± 0.27 63.51 ± 0.18 GL 59.41 ± 0.94 32.70 ± 1.20 71.63 ± 0.33 66.00 ± 0.07 66.59 ± 0.08 SP 60.00 ± 0.72 41.68 ± 1.79 73.32 ± 0.45 73.47 ± 0.11 73.07 ± 0.11 WL-RGE(ASG) 62.20 ± 1.67(1s) 57.97 ± 1.16(38s) 76.63 ± 0.82(30s) 85.85 ± 0.42(401s) 85.32 ± 0.29(798s) WL-ST 57.64 ± 0.68 52.22 ± 0.71 72.92 ± 0.67 82.19 ± 0.18 82.46 ± 0.24 WL-SP 56.76 ± 0.78 59.05 ± 1.05 74.49 ± 0.74 84.55 ± 0.36 83.53 ± 0.30 WL-OA-E (A) 59.72 ± 1.10 53.76 ± 0.82 — 84.75 ± 0.21 84.23 ± 0.19 Table 3: Comparison of classication accuracy against recent deep learning models on graphs. Datasets PTC-MR PROTEINS NCI1 IMDB-B IMDB-M COLLAB (WL-)RGE(ASG) 62.20 ± 1.67 76.63 ± 0.82 85.85 ± 0.42 71.48 ± 1.01 47.26 ± 0.89 76.85 ± 0.34 DGCNN 58.59 ± 2.47 75.54 ± 0.94 74.44 ± 0.47 70.03 ± 0.86 47.83 ± 0.85 73.76 ± 0.49 PSCN 62.30 ± 5.70 75.00 ± 2.51 76.34 ± 1.68 71.00 ± 2.29 45.23 ± 2.84 72.60 ± 2.15 DCNN 56.6 ± 1.20 61.29 ± 1.60 56.61 ± 1.04 49.06 ± 1.37 33.49 ± 1.42 52.11 ± 0.53 DGK 57.32 ± 1.13 71.68 ± 0.50 62.48 ±0.25 66.96 ± 0.56 44.55 ± 0.52 73.09 ± 0.25 aph in the range of n = [8 1024], respectively. When generating ndom adjacency matrices, we set the number of edges always be ice the number of nodes in a graph. We report the runtime for mputing node embeddings using a state-of-the-art eigensolver 0], generating RGE graph embeddings, and the overall computa- n of graph classication, accordingly. Fig. 3(a) shows the linear alability of RGE when increasing the number of graphs, conrm- g our complexity analysis in the previous Section. In addition, as property of our RGE embeddings, which open the door to lar scale applications of graph kernels for various applications such social networks analysis and computational biology. Comparison with All Baselines. Tables 1, 2, and 3 show th RGE consistently outperforms or matches other state-of-the- graph kernels and deep learning approaches in terms of clas cation accuracy. There are several further observations wor making here. First, EMD, the closest method to RGE, shows go ▶ Outperforms other graph kernels and deep learning approaches ▶ RGE is much faster than EMD ▶ WL-technique makes good performance 30 / 35
  • 31. Conclusion Proposed good graph kernel! ▶ Be scalable ▶ Take into account global property thank you. 31 / 35
  • 32. Appendix I ▶ グラフが同型ならば, 隣接行列の固有値は一致するが, 逆は成り立た ない Normalized Laplacian Matrix: Li,j :=    1 if i = j and deg (vi ) ̸= 0 − 1√ deg(vi ) deg(vj ) if i ̸= j and vi is adjacent to vj 0 otherwise. deg(v): Degree of node (vertex) v 32 / 35
  • 34. Appendix III Table 4: Properties of the datasets. Dataset MUTAG PTC ENZYMES PROTEINS NCI1 NCI109 IMDB-B IMDB-M COLLAB Max # Nodes 28 109 126 620 111 111 136 89 492 Min # Nodes 10 2 2 4 3 4 12 7 32 Ave # Nodes 17.9 25.6 32.6 39.05 29.9 29.7 19.77 13.0 74.49 Max # Edges 33 108 149 1049 119 119 1249 1467 40119 Min # Edges 10 1 1 5 2 3 26 12 60 Ave # Edges 19.8 26.0 62.1 72.81 32.3 32.1 96.53 65.93 2457.34 # Graph 188 344 600 1113 4110 4127 1000 1500 5000 # Graph Labels 2 2 6 2 2 2 2 3 3 # Node Labels 7 19 3 3 37 38 — — — wice the number of nodes in a graph. We use the size of ding d = 6 just like in the previous sections. We set the eters related to RGE itself are DMax = 10 and R = 128. e runtime for computing node embeddings using state- gensolver [33, 40] and RGE graph embeddings, and the me, respectively. ditional Results and Discussions on mparisons Against All Baselines e RGE is a graph embedding, we directly employ a lin- plemented in LIBLIBNEAR [7] since it can faithfully eectiveness of our feature representation from the nonlinear learning solvers. Following the convention experiments ten times (thus 100 runs per dataset) an average prediction accuracies and standard deviations of hyperparameters and D_max are [1e-3 1e-2 1e [3:3:30], respectively. All parameters of the SVM and eters of our method were optimized only on the train The node embedding size is set to either 4, 6 or 8 bu the same number for all variants of RGE on the same eliminate the random eects, we repeat the whole exp times and report the average prediction accuracies a deviations. For all baselines we take the best number the papers except EMD, where we rerun the experim comparisons in terms of both accuracy and runtime. Sin EMD, and PM are essentially built on the same node Terms WL test: ▶ Technique to improve kernel with node labels RGE(ASG)-NodeLab: ▶ Data-dependent random graph + Incorporating Label information WL-RGE: ▶ Data-dependent random graph + WL test 34 / 35
  • 35. 引用 I Giannis Nikolentzos, Polykarpos Meladianos, and Michalis Vazirgiannis. Matching node embeddings for graph similarity. In Thirty-First AAAI Conference on Artificial Intelligence, 2017. 35 / 35