8

Let's say I have a partly connected graph that represents members of many unrelated communities. I would like to predict the possible friendships between members of the same community: on an sliding scale between 0 to 10 how likey would they like each other? I have some characteristics of them whether they are christian, or like sports, and also some geographical features, the distance between them.

The connections could be whether or not they are friends on a social media platform. In the networks, they are not necessarily connected with edges.

I am using pytorch_geometric to build a graph for each community and add edges for connections on the social media platform. One edge for each direction, so the graph is bi-directional. Then I create Data() instances.

Data(x=x, edge_index=edge_index)

Where x is an array with node features and edge_index

x = array([[ 0,  4,  6,  0,  0,  1],
   [ 1,  4,  6,  0,  0,  1],
   [ 2,  4,  6,  0,  0,  1],
   [ 3,  4,  6,  0,  1,  0],
   [ 4,  4,  6,  0,  1,  0],
   ...])

edge_index = [[0, 1],
 [0, 9],
 [0, 10],
 [0, 11],
 [1, 2],
 [1, 7],
 [1, 12],
 [2, 3],
 [2, 6],
 [2, 13],
 [3, 4],
 ...]

Not sure what is the best route from here to train on and predict relationships. What is generally used in this case? There are a few options mentioned in the documentation: EdgeConv, DynamicEdgeConv, GCNCon. I am not sure what to try first. Is there anything available that is made for this kind of problems or do I have to setup my own MessagePassing class?

Data() accepts an argument y to train on nodes. Can I actually use pytorch_geometric for this kind of problem or do I have to go back to pytorch?

Soerendip
  • 724
  • 1
  • 9
  • 16
  • After browsing the examples I found about dense_diff_pool which also returns a "auxiliary link prediction objective". There is an example enzymes_diff_pool.py which demonstrates it's use.

    https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#torch_geometric.nn.dense.diff_pool.dense_diff_pool

    https://github.com/rusty1s/pytorch_geometric/blob/master/examples/enzymes_diff_pool.py

    – zbyte Oct 18 '19 at 22:40

2 Answers2

1

Seems the easiest way to do this in pytorch geometric is to use an autoencoder model. In the examples folder there is an autoencoder.py which demonstrates its use. The gist of it is that it takes in a single graph and tries to predict the links between the nodes (see recon_loss) from an encoded latent space that it learns. The example is of one large graph, for my purposes I had multiple graphs which meant each one got their edges split and trained separately.

zbyte
  • 111
  • 2
0

Here is a rough implementation of a solution (feedback welcome). To build the Graph encoder, I followed the tutorial here (https://antoniolonga.github.io/Pytorch_geometric_tutorials/posts/post6.html)

I start by making a graph on 100 nodes with a community structure. Namely two strongly connected communities.

enter image description here

To do this graph I used the following code

import numpy as np
import torch
import networkx as nx
from matplotlib import pylab as plt
import torch.nn.functional as F
from sklearn.metrics import roc_auc_score

import torch_geometric.transforms as T from torch_geometric.nn import GCNConv from torch_geometric.utils import negative_sampling from torch_geometric.utils import train_test_split_edges

from torch_geometric.nn import GAE import torch_geometric.data as data from torch_geometric.utils.convert import to_networkx import torch_geometric

set seed for reproducibility

torch.manual_seed(1234) np.random.seed(1234)

n_nodes = 100 tup_c1 = (0,50) tup_c2 = (50,100) n_edges_inter = 100 n_edges_intra = 1000

have first 50 nodes of one type and other 50 nodes of other type

node_attr = (torch.hstack([torch.zeros(50), torch.ones(50)])) node_attr= torch.reshape(node_attr, (n_nodes, 1))

edges within cluster 1

rows_11 = np.random.choice([i for i in range(tup_c1[0], tup_c1[1])], n_edges_intra) cols_11 = np.random.choice([i for i in range(tup_c1[0], tup_c1[1])], n_edges_intra) edges_11 = torch.tensor([rows_11, cols_11])

edges within cluster 2

rows_22 = np.random.choice([i for i in range(tup_c2[0], tup_c2[1])], n_edges_intra) cols_22 = np.random.choice([i for i in range(tup_c2[0], tup_c2[1])], n_edges_intra) edges_22 = torch.tensor([rows_22, cols_22])

edges from 2-1

rows_21 = np.random.choice([i for i in range(tup_c2[0], tup_c2[1])], n_edges_inter) cols_21 = np.random.choice([i for i in range(tup_c1[0], tup_c1[1])], n_edges_inter) edges_21 = torch.tensor([rows_21, cols_21])

edges from 1-2

rows_12 = np.random.choice([i for i in range(tup_c1[0], tup_c1[1])], n_edges_inter) cols_12 = np.random.choice([i for i in range(tup_c2[0], tup_c2[1])], n_edges_inter) edges_12 = torch.tensor([rows_12, cols_12])

concatenate all edges

edges = torch.hstack([edges_11, edges_22, edges_21, edges_12])

give edge weights, with inter cluster edges with less weights by a factor

factor = 1.0 edges_attr = torch.tensor(np.hstack([np.random.rand(2n_edges_intra), factornp.random.rand(2*n_edges_inter)]))

I then define a dataset. Node features is just the identity matrix.

graph = data.Data(x=torch.eye(100), edge_index=edges, edge_attr=edges_attr)
data = train_test_split_edges(graph)

We then define an GAE and train it.

class GCNEncoder(torch.nn.Module):
    def __init__(self, in_channels, out_channels):
        super(GCNEncoder, self).__init__()
        self.conv1 = GCNConv(in_channels, 2 * out_channels, cached=True) # cached only for transductive learning
        self.conv2 = GCNConv(2 * out_channels, out_channels, cached=True) # cached only for transductive learning
        # cached is useful when you have only one graph. When you have many it is less useful.
def forward(self, x, edge_index):
    x = self.conv1(x, edge_index).relu()
    return self.conv2(x, edge_index)

parameters

out_channels = 2 # dimension of embedding space num_features = 100 # identity matrix epochs = 1000

model

model = GAE(GCNEncoder(num_features, out_channels))

move to GPU (if available)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = model.to(device) x = data.x.to(device) train_pos_edge_index = data.train_pos_edge_index.to(device)

inizialize the optimizer

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

def train(): model.train() optimizer.zero_grad() z = model.encode(x, train_pos_edge_index) loss = model.recon_loss(z, train_pos_edge_index) #if args.variational: # loss = loss + (1 / data.num_nodes) * model.kl_loss() loss.backward() optimizer.step() return float(loss)

def test(pos_edge_index, neg_edge_index): model.eval() with torch.no_grad(): z = model.encode(x, train_pos_edge_index) return model.test(z, pos_edge_index, neg_edge_index)

for epoch in range(1, epochs + 1): loss = train()

auc, ap = test(data.test_pos_edge_index, data.test_neg_edge_index)
if epoch % 100 == 0:

    print('Epoch: {:03d}, AUC: {:.4f}, AP: {:.4f}'.format(epoch, auc, ap))

plt.imshow((z @ z.t()).detach()) plt.colorbar() plt.title("edges probability: z @ z.T") plt.savefig("example_out.png") plt.show()

By decoding the embedded space we get a similar community structure enter image description here

RM-
  • 101
  • 2