Soccer Graph Neural Networks
This tutorial covers how to convert soccer tracking data into graphs and train Graph Neural Networks for various prediction tasks.
The workflow consists of:
Loading tracking data with Kloppy
Converting to Polars DataFrame
Adding labels and graph IDs
Converting to graph structures
Training a GNN model
Making predictions
Interactive Notebooks
We provide comprehensive Jupyter notebooks that walk through the entire process:
Quick Start Guides
PyTorch Geometric (Recommended)
-
Basic 5-step workflow
Model training and prediction
Saving and loading models
Spektral (Deprecated - Python 3.11 only)
-
Legacy TensorFlow/Keras approach
Not recommended for new projects
In-Depth Tutorials
PyTorch Geometric
Comprehensive GNN Training (PyTorch)
Multiple match processing
Custom graph features
Label engineering
Pickle file storage for large datasets
Advanced training patterns
Spektral
Comprehensive GNN Training (Spektral)
Legacy approach for Python 3.11
Graph Configuration FAQ
-
What is a graph?
Complete list of graph settings
All available node and edge features
Adjacency matrix representations
Custom feature functions
Key Concepts
Data Preparation
Loading Data
Use Kloppy to load tracking data from various providers:
from kloppy import sportec, skillcorner, tracab
from unravel.soccer import KloppyPolarsDataset
# Sportec (DFL Open Data)
kloppy_dataset = sportec.load_open_tracking_data(
only_alive=True, # Only frames with ball in play
limit=500 # Limit to 500 frames for testing
)
# SkillCorner
kloppy_dataset = skillcorner.load(
...
include_empty_frames=False # IMPORTANT for non-Sportec data
)
# Convert to Polars
polars_dataset = KloppyPolarsDataset(kloppy_dataset=kloppy_dataset)
Adding Labels
For supervised learning, add a label column:
from unravel.utils import add_dummy_label_column
# Option 1: Dummy labels (for testing)
polars_dataset.dataset = add_dummy_label_column(polars_dataset.dataset)
# Option 2: Join real labels from your data
import polars as pl
# Your labels should have matching keys to join on
labels = pl.DataFrame({
"frame_id": [10000, 10001, 10002, ...],
"label": [0, 1, 0, ...]
})
polars_dataset.dataset = polars_dataset.dataset.join(
labels,
on="frame_id",
how="left"
)
Adding Graph IDs
Group frames into graph samples:
from unravel.utils import add_graph_id_column
# Each frame is a separate graph
polars_dataset.dataset = add_graph_id_column(
polars_dataset.dataset,
by=["frame_id"]
)
# Or group by possession
polars_dataset.dataset = add_graph_id_column(
polars_dataset.dataset,
by=["ball_owning_team_id", "period_id"] # Changes when possession changes
)
# Or by sequences (e.g., 10-frame sequences)
polars_dataset.dataset = polars_dataset.dataset.with_columns(
(pl.col("frame_id") // 10).alias("sequence_id")
)
polars_dataset.dataset = add_graph_id_column(
polars_dataset.dataset,
by=["sequence_id"]
)
Graph Conversion
Basic Configuration
from unravel.soccer import SoccerGraphConverter
converter = SoccerGraphConverter(
dataset=polars_dataset,
# Ball connections
self_loop_ball=True, # Add self-loop to ball node
adjacency_matrix_connect_type="ball", # How to connect ball
# Graph structure
adjacency_matrix_type="split_by_team", # Team-based connections
# Labels
label_type="binary", # Or "continuous", "multiclass"
# Node values for specific roles
defending_team_node_value=0.1,
non_potential_receiver_node_value=0.1,
# Training options
random_seed=False, # Permutation invariance during training
pad=False, # Padding for fixed-size graphs
verbose=False,
)
Node Features
Available node features for soccer (12 total):
x_normed: Normalized x-coordinatey_normed: Normalized y-coordinatevx_normed: Normalized x-velocityvy_normed: Normalized y-velocityspeed_normed: Normalized speed magnitudeax_normed: Normalized x-accelerationay_normed: Normalized y-accelerationacceleration_normed: Normalized acceleration magnitudesin_direction: Sine of movement directioncos_direction: Cosine of movement directiondistance_to_goal_normed: Distance to opponent’s goalangle_to_goal_normed: Angle to opponent’s goal
Edge Features
Available edge features (6-7 total):
distance_normed: Distance between nodessin_angle: Sine of angle between nodescos_angle: Cosine of angle between nodesrelative_speed_normed: Relative speedrelative_vx_normed: Relative x-velocityrelative_vy_normed: Relative y-velocity(Optionally) edge type indicators
Adjacency Matrix Types
split_by_team: Separate graphs for each team, connected via balldelaunay: Spatial proximity based on Delaunay triangulationdense: Fully connected graphdense_ap: Dense for attacking team, proximity for defendingdense_dp: Dense for defending team, proximity for attacking
Custom Features
Add custom node or edge features:
from unravel.utils.features import graph_feature
@graph_feature(
cols=["x", "y"],
returns=["distance_from_center"],
type="node"
)
def distance_from_center(x, y):
import polars as pl
return [pl.sqrt(x**2 + y**2)]
# Use in converter
converter = SoccerGraphConverter(
dataset=polars_dataset,
node_feature_cols=[distance_from_center],
...
)
Converting to Graph Format
# PyTorch Geometric (recommended)
graphs = converter.to_pytorch_graphs()
# Spektral (deprecated)
graphs = converter.to_spektral_graphs()
# Save to pickle for reuse
import pickle
import gzip
with gzip.open("graphs.pickle.gz", "wb") as f:
pickle.dump(graphs, f)
# Load from pickle
with gzip.open("graphs.pickle.gz", "rb") as f:
graphs = pickle.load(f)
Model Training
PyTorch Geometric Approach
from unravel.utils import GraphDataset
from unravel.classifiers import PyGLightningCrystalGraphClassifier
import pytorch_lightning as pyl
from torch_geometric.loader import DataLoader
# Create dataset
dataset = GraphDataset(graphs=graphs, format="pyg")
# Split data (4:1:1 ratio for train:test:val)
train_graphs, test_graphs, val_graphs = dataset.split_test_train_validation(4, 1, 1)
# Create data loaders
train_loader = DataLoader(train_graphs, batch_size=32, shuffle=True)
val_loader = DataLoader(val_graphs, batch_size=32, shuffle=False)
test_loader = DataLoader(test_graphs, batch_size=32, shuffle=False)
# Initialize model
model = PyGLightningCrystalGraphClassifier(
node_features=converter.n_node_features,
edge_features=converter.n_edge_features,
global_features=converter.n_graph_features,
output_features=1, # Binary classification
learning_rate=0.001,
)
# Train
trainer = pyl.Trainer(
max_epochs=50,
accelerator="auto", # Use GPU if available
devices=1,
log_every_n_steps=10,
)
trainer.fit(model, train_loader, val_loader)
# Test
test_results = trainer.test(model, test_loader)
# Save model
trainer.save_checkpoint("model.ckpt")
# Load model
model = PyGLightningCrystalGraphClassifier.load_from_checkpoint("model.ckpt")
Making Predictions
# Set converter to prediction mode (no labels needed)
converter_pred = SoccerGraphConverter(
dataset=new_data,
prediction=True, # Important!
**same_settings_as_training
)
# Convert new data
new_graphs = converter_pred.to_pytorch_graphs()
new_dataset = GraphDataset(graphs=new_graphs, format="pyg")
pred_loader = DataLoader(new_dataset, batch_size=32)
# Predict
predictions = trainer.predict(model, pred_loader)