-- Living Mobile --: Friend prediction using pytorch geometry

Implementing a "Friend Prediction" (also known as **Link Prediction**) system using GNNs, GraphSAGE, or GAT follows a highly structured pipeline. In this setup, your users are **nodes**, existing friendships are **edges**, and the goal is to predict the probability that an edge *should* exist between two currently unconnected nodes.

Here is a step-by-step guide on how to design and implement this application.

## 1. The Core Architecture (The Encoder-Decoder Framework)

Most GNN-based link prediction models use an **Encoder-Decoder** workflow:

1. **The Encoder (GNN / GraphSAGE / GAT):** Takes the graph structure and node features (e.g., user age, location, interests) and outputs a low-dimensional vector (embedding) for every single user.

2. **The Decoder:** Takes the embeddings of two users (User A and User B) and computes a similarity score (using Dot Product or a small Multi-Layer Perceptron). A high score means they are likely to become friends.

```

[Graph Data: Nodes & Edges]

│

▼

┌───────────────┐

│ ENCODER │ ──► Generates User Embeddings ($z_u, z_v$)

│(SAGE/GAT/GCN) │

└───────────────┘

│

▼

┌───────────────┐

│ DECODER │ ──► Computes Link Score (e.g., $Score = z_u^T \cdot z_v$)

│ (Dot Product) │

└───────────────┘

│

▼

[Friend Prediction Probability]

```

## 2. Choosing the Right Layer for the Job

While the pipeline remains identical, changing the model type changes how the **Encoder** aggregates information:

* **GraphSAGE (Best for Large Scale):** If your user base is massive or constantly growing, GraphSAGE is the practical choice. It will sample a subset of a user's current friends to update their embedding, preventing memory bottlenecks.

* **GAT (Best for Feature-Driven Matches):** If you want the model to learn *why* people are friends (e.g., "User A and User B are friends because they share a niche hobby, ignoring the fact that they live in different cities"), GAT’s attention mechanism dynamically weights neighbor importance based on profile features.

## 3. Step-by-Step Implementation Workflow

If you are implementing this in Python, the gold standard libraries are **PyTorch Geometric (PyG)** or **DGL (Deep Graph Library)**.

### Step A: Graph Setup & Data Splitting

Unlike standard machine learning where you split rows of data, in link prediction, you must **split the edges**.

* **Training Edges:** The friendships the GNN is allowed to "see" and message-pass through.

* **Positive Validation/Test Edges:** Real friendships held out to evaluate if the model can predict them.

* **Negative Validation/Test Edges:** Randomly sampled pairs of users who are *not* friends, used to teach the model what a "non-friendship" looks like.

### Step B: Defining the Model (PyTorch Geometric Style)

Here is a conceptual implementation using PyG. You can easily swap SAGEConv for GATConv or GCNConv.

```python

import torch

import torch.nn as nn

import torch.nn.functional as F

from torch_geometric.nn import SAGEConv

class FriendPredictor(nn.Module):

def __init__(self, in_channels, hidden_channels, out_channels):

super().__init__()

# Encoder Layers (Using GraphSAGE as an example)

self.conv1 = SAGEConv(in_channels, hidden_channels)

self.conv2 = SAGEConv(hidden_channels, out_channels)

def encode(self, x, edge_index):

# Generates node embeddings

x = self.conv1(x, edge_index)

x = F.relu(x)

x = self.conv2(x, edge_index)

return x

def decode(self, z, edge_label_index):

# Decoder: Dot product between source and target node embeddings

src = z[edge_label_index[0]]

dst = z[edge_label_index[1]]

return (src * dst).sum(dim=-1) # Returns a similarity score for each pair

```

### Step C: The Training Loop

To train the network, you need to pass both **positive edges** (real friends) and **negative edges** (random users) through the decoder, forcing the model to score positive edges close to 1 and negative edges close to 0.

```python

model = FriendPredictor(in_channels=num_features, hidden_channels=64, out_channels=32)

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

criterion = torch.nn.BCEWithLogitsLoss()

def train():

model.train()

optimizer.zero_grad()

# 1. Encode: Pass the training graph structure to get node embeddings

z = model.encode(train_data.x, train_data.edge_index)

# 2. Decode Positive Edges

pos_out = model.decode(z, train_data.pos_edge_label_index)

# 3. Decode Negative Edges (sampled on the fly or pre-sampled)

neg_out = model.decode(z, train_data.neg_edge_label_index)

# 4. Combine predictions and calculate Binary Cross Entropy Loss

predictions = torch.cat([pos_out, neg_out], dim=0)

targets = torch.cat([torch.ones(pos_out.size(0)), torch.zeros(neg_out.size(0))], dim=0)

loss = criterion(predictions, targets)

loss.backward()

optimizer.step()

return loss.item()

```

## 4. Serving Recommendations in Production

Once trained, generating "People You May Know" recommendations for a specific user follows this deployment logic:

1. Run the **Encoder** pass over your graph once (e.g., nightly or in mini-batches) to generate updated embeddings for all active users.

2. To suggest friends for User A, extract their embedding (z_A).

3. Calculate the dot product of z_A against the embeddings of candidates (e.g., friends-of-friends who aren't currently connected to User A).

4. Sort the candidates by their score in descending order and serve the top 5 as friend recommendations.

-- Living Mobile --

Sunday, May 17, 2026

Friend prediction using pytorch geometry

No comments:

Post a Comment

Followers

Blog Archive

About Me