How to use Graph Neural Network to predict relationships between nodes with pytorch_geometric?

Question

Let's say I have a partly connected graph that represents members of many unrelated communities. I would like to predict the possible friendships between members of the same community: on an sliding scale between 0 to 10 how likey would they like each other? I have some characteristics of them whether they are christian, or like sports, and also some geographical features, the distance between them.

The connections could be whether or not they are friends on a social media platform. In the networks, they are not necessarily connected with edges.

I am using pytorch_geometric to build a graph for each community and add edges for connections on the social media platform. One edge for each direction, so the graph is bi-directional. Then I create Data() instances.

Data(x=x, edge_index=edge_index)

Where x is an array with node features and edge_index

x = array([[ 0,  4,  6,  0,  0,  1],
   [ 1,  4,  6,  0,  0,  1],
   [ 2,  4,  6,  0,  0,  1],
   [ 3,  4,  6,  0,  1,  0],
   [ 4,  4,  6,  0,  1,  0],
   ...])

edge_index = [[0, 1],
 [0, 9],
 [0, 10],
 [0, 11],
 [1, 2],
 [1, 7],
 [1, 12],
 [2, 3],
 [2, 6],
 [2, 13],
 [3, 4],
 ...]

Not sure what is the best route from here to train on and predict relationships. What is generally used in this case? There are a few options mentioned in the documentation: EdgeConv, DynamicEdgeConv, GCNCon. I am not sure what to try first. Is there anything available that is made for this kind of problems or do I have to setup my own MessagePassing class?

Data() accepts an argument y to train on nodes. Can I actually use pytorch_geometric for this kind of problem or do I have to go back to pytorch?

After browsing the examples I found about dense_diff_pool which also returns a "auxiliary link prediction objective". There is an example enzymes_diff_pool.py which demonstrates it's use.
https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#torch_geometric.nn.dense.diff_pool.dense_diff_pool

https://github.com/rusty1s/pytorch_geometric/blob/master/examples/enzymes_diff_pool.py — zbyte, Oct 18 '19 at 22:40

score 1 · Answer 1 · answered Nov 04 '19 at 16:58

1

Seems the easiest way to do this in pytorch geometric is to use an autoencoder model. In the examples folder there is an autoencoder.py which demonstrates its use. The gist of it is that it takes in a single graph and tries to predict the links between the nodes (see recon_loss) from an encoded latent space that it learns. The example is of one large graph, for my purposes I had multiple graphs which meant each one got their edges split and trained separately.

answered Nov 04 '19 at 16:58

zbyte

111
2

Does it mean that as a result you got a separate model for each community, or have you used another approach? – Dmitrii Badretdinov Aug 08 '21 at 05:59

score 0 · Answer 2 · answered Mar 25 '22 at 14:32

Here is a rough implementation of a solution (feedback welcome). To build the Graph encoder, I followed the tutorial here (https://antoniolonga.github.io/Pytorch_geometric_tutorials/posts/post6.html)

I start by making a graph on 100 nodes with a community structure. Namely two strongly connected communities.

To do this graph I used the following code

import numpy as np
import torch
import networkx as nx
from matplotlib import pylab as plt
import torch.nn.functional as F
from sklearn.metrics import roc_auc_score
import torch_geometric.transforms as T
from torch_geometric.nn import GCNConv
from torch_geometric.utils import negative_sampling
from torch_geometric.utils import train_test_split_edges
from torch_geometric.nn import GAE
import torch_geometric.data as data
from torch_geometric.utils.convert import to_networkx
import torch_geometric
set seed for reproducibility
torch.manual_seed(1234)
np.random.seed(1234)
n_nodes = 100
tup_c1 = (0,50)
tup_c2 = (50,100)
n_edges_inter = 100
n_edges_intra = 1000
have first 50 nodes of one type and other 50 nodes of other type
node_attr = (torch.hstack([torch.zeros(50), torch.ones(50)]))
node_attr= torch.reshape(node_attr, (n_nodes, 1))
edges within cluster 1
rows_11 = np.random.choice([i for i in range(tup_c1[0], tup_c1[1])], n_edges_intra)
cols_11 = np.random.choice([i for i in range(tup_c1[0], tup_c1[1])], n_edges_intra)
edges_11 = torch.tensor([rows_11, cols_11])
edges within cluster 2
rows_22 = np.random.choice([i for i in range(tup_c2[0], tup_c2[1])], n_edges_intra)
cols_22 = np.random.choice([i for i in range(tup_c2[0], tup_c2[1])], n_edges_intra)
edges_22 = torch.tensor([rows_22, cols_22])
edges from 2-1
rows_21 = np.random.choice([i for i in range(tup_c2[0], tup_c2[1])], n_edges_inter)
cols_21 = np.random.choice([i for i in range(tup_c1[0], tup_c1[1])], n_edges_inter)
edges_21 = torch.tensor([rows_21, cols_21])
edges from 1-2
rows_12 = np.random.choice([i for i in range(tup_c1[0], tup_c1[1])], n_edges_inter)
cols_12 = np.random.choice([i for i in range(tup_c2[0], tup_c2[1])], n_edges_inter)
edges_12 = torch.tensor([rows_12, cols_12])
concatenate all edges
edges = torch.hstack([edges_11, edges_22, edges_21, edges_12])
give edge weights, with inter cluster edges with less weights by a factor
factor = 1.0
edges_attr = torch.tensor(np.hstack([np.random.rand(2n_edges_intra), factornp.random.rand(2*n_edges_inter)]))

I then define a dataset. Node features is just the identity matrix.

graph = data.Data(x=torch.eye(100), edge_index=edges, edge_attr=edges_attr)
data = train_test_split_edges(graph)

We then define an GAE and train it.

class GCNEncoder(torch.nn.Module):
    def __init__(self, in_channels, out_channels):
        super(GCNEncoder, self).__init__()
        self.conv1 = GCNConv(in_channels, 2 * out_channels, cached=True) # cached only for transductive learning
        self.conv2 = GCNConv(2 * out_channels, out_channels, cached=True) # cached only for transductive learning
        # cached is useful when you have only one graph. When you have many it is less useful.
def forward(self, x, edge_index):
    x = self.conv1(x, edge_index).relu()
    return self.conv2(x, edge_index)

parameters
out_channels = 2 # dimension of embedding space
num_features = 100 # identity matrix
epochs = 1000
model
model = GAE(GCNEncoder(num_features, out_channels))
move to GPU (if available)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
x = data.x.to(device)
train_pos_edge_index = data.train_pos_edge_index.to(device)
inizialize the optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
def train():
    model.train()
    optimizer.zero_grad()
    z = model.encode(x, train_pos_edge_index)
    loss = model.recon_loss(z, train_pos_edge_index)
    #if args.variational:
    #   loss = loss + (1 / data.num_nodes) * model.kl_loss()
    loss.backward()
    optimizer.step()
    return float(loss)
def test(pos_edge_index, neg_edge_index):
    model.eval()
    with torch.no_grad():
        z = model.encode(x, train_pos_edge_index)
    return model.test(z, pos_edge_index, neg_edge_index)
for epoch in range(1, epochs + 1):
    loss = train()
auc, ap = test(data.test_pos_edge_index, data.test_neg_edge_index)
if epoch % 100 == 0:

    print('Epoch: {:03d}, AUC: {:.4f}, AP: {:.4f}'.format(epoch, auc, ap))


plt.imshow((z @ z.t()).detach())
plt.colorbar()
plt.title("edges probability: z @ z.T")
plt.savefig("example_out.png")
plt.show()

By decoding the embedded space we get a similar community structure

How to use Graph Neural Network to predict relationships between nodes with pytorch_geometric?

2 Answers2

set seed for reproducibility

have first 50 nodes of one type and other 50 nodes of other type

edges within cluster 1

edges within cluster 2

edges from 2-1

edges from 1-2

concatenate all edges

give edge weights, with inter cluster edges with less weights by a factor

parameters

model

move to GPU (if available)

inizialize the optimizer