unravel.soccer.SoccerGraphConverter

class unravel.soccer.SoccerGraphConverter[source]

Convert soccer tracking data from Polars DataFrame to graph structures for GNN training.

This class transforms soccer tracking data into graph representations suitable for Graph Neural Networks. Each frame of tracking data becomes a graph with players and the ball as nodes, with edges representing spatial relationships or team affiliations.

The converter supports two GNN frameworks: - PyTorch Geometric (recommended) via to_pytorch_graphs() - Spektral (deprecated, Python 3.11 only) via to_spektral_graphs()

Graph Structure:

Nodes: Players (home team, away team) and ball
Node Features: Position, velocity, acceleration, distances, angles (12 default features)
Edges: Defined by adjacency_matrix_type (team-based, spatial, or dense)
Edge Features: Distances, angles, relative velocities (6-7 default features)
Global Features: Optional match-level features attached to ball node

Key Features:

Configurable node and edge feature engineering
Multiple adjacency matrix types (split_by_team, delaunay, dense)
Custom feature functions via decorators
Automatic padding for fixed-size graphs
Ball connection strategies (all players, carrier only, none)
Permutation invariance via random node ordering

Parameters:

dataset (KloppyPolarsDataset) – Polars dataset with tracking data. Must have been processed with add_graph_ids() and optionally add_dummy_labels().
chunk_size (int, optional) – Number of graphs to process simultaneously. Higher values use more memory but may be faster. Defaults to 20000.
non_potential_receiver_node_value (float, optional) – Node feature value (0-1) assigned to defending team players. Used to distinguish attackers from defenders. Defaults to 0.1.
edge_feature_funcs (List[Callable], optional) – Custom edge feature functions decorated with @graph_feature(type="edge"). If None, uses defaults. Defaults to None.
node_feature_funcs (List[Callable], optional) – Custom node feature functions decorated with @graph_feature(type="node"). If None, uses defaults. Defaults to None.
global_feature_cols (List[str], optional) – Column names from the dataset to use as graph-level features (e.g., match score, team ratings). Must be constant within each graph_id group. Defaults to empty list.
global_feature_type (Literal[``”ball”, ``"all"], optional) – Where to attach global features. “ball” attaches to ball node only, “all” attaches to all nodes. Defaults to “ball”.
additional_feature_cols (List[str], optional) – Extra columns from dataset to make available to custom feature functions (e.g., player height, position). Defaults to empty list.

settings

Configuration for graph conversion including adjacency matrix type, padding, and feature settings.

Type:: GraphSettingsPolars

n_node_features

Total number of node features per node.

Type:: int

n_edge_features

Total number of edge features per edge.

Type:: int

n_graph_features

Total number of global/graph-level features.

Type:: int

Raises:

ValueError – If dataset is not a KloppyPolarsDataset.
ValueError – If required columns (graph_id, label) are missing.
ValueError – If custom feature functions are not properly decorated.

Example

>>> from unravel.soccer import KloppyPolarsDataset, SoccerGraphConverter
>>> from kloppy import sportec
>>>
>>> # Load and prepare data
>>> kloppy_dataset = sportec.load_open_tracking_data(only_alive=True)
>>> polars_dataset = KloppyPolarsDataset(kloppy_dataset=kloppy_dataset)
>>> polars_dataset.add_dummy_labels(by=["frame_id"])
>>> polars_dataset.add_graph_ids(by=["frame_id"])
>>>
>>> # Create converter
>>> converter = SoccerGraphConverter(
...     dataset=polars_dataset,
...     self_loop_ball=True,
...     adjacency_matrix_connect_type="ball",
...     adjacency_matrix_type="split_by_team",
...     label_type="binary",
... )
>>>
>>> # Convert to PyTorch Geometric format
>>> graphs = converter.to_pytorch_graphs()
>>> print(f"Created {len(graphs)} graphs")
>>> print(f"Node features: {converter.n_node_features}")
>>> print(f"Edge features: {converter.n_edge_features}")

Note

For detailed configuration options, see GraphSettingsPolars. For custom features, see graph_feature() decorator.

Warning

If not using padding (pad=False), graphs with incomplete player data (< 22 players) will be dropped. Use pad=True for variable-sized teams.

See also

KloppyPolarsDataset: Prepare tracking data. GraphDataset: Wrap graphs for training. graph_feature(): Create custom features. ../tutorials/soccer_gnn: Complete GNN training tutorial. Graph FAQ: Detailed configuration guide.

__init__(engine='auto', prediction=False, self_loop_ball=False, adjacency_matrix_connect_type='ball', adjacency_matrix_type='split_by_team', label_type='binary', defending_team_node_value=0.1, random_seed=False, pad=False, verbose=False, label_col=None, graph_id_col=None, sample_rate=None, dataset=None, chunk_size=20000, non_potential_receiver_node_value=0.1, edge_feature_funcs=<factory>, node_feature_funcs=<factory>, global_feature_cols=<factory>, global_feature_type='ball', additional_feature_cols=<factory>)

Parameters:

engine (Literal['auto', 'gpu'])
prediction (bool)
self_loop_ball (bool)
adjacency_matrix_connect_type (Literal['ball', 'ball_carrier', 'no_connection'])
adjacency_matrix_type (Literal['delaunay', 'split_by_team', 'dense', 'dense_ap', 'dense_dp'])
label_type (Literal['binary'])
defending_team_node_value (float)
random_seed (bool | int)
pad (bool)
verbose (bool)
label_col (str)
graph_id_col (str)
sample_rate (float)
dataset (KloppyPolarsDataset)
chunk_size (int)
non_potential_receiver_node_value (float)
edge_feature_funcs (List[Callable[[Dict[str, Any]], ndarray]])
node_feature_funcs (List[Callable[[Dict[str, Any]], ndarray]])
global_feature_cols (List[str] | None)
global_feature_type (Literal['ball', 'all'])
additional_feature_cols (List[str] | None)

Return type:

None

Methods

`__init__`([engine, prediction, ...])
`get_player_by_id`(player_id)
`get_players_by_team_id`(team_id)
`plot`(file_path[, fps, timestamp, ...])	Plot tracking data as a static image or video file.
`to_custom_dataset`([include_object_ids])	Spektral requires a spektral Dataset to load the data for docs see https://graphneural.network/creating-dataset/
`to_graph_dataset`([include_object_ids])	Spektral requires a spektral Dataset to load the data for docs see https://graphneural.network/creating-dataset/
`to_graph_frames`([include_object_ids])
`to_pickle`(file_path[, verbose, ...])	We store the 'dict' version of the Graphs to pickle each graph is now a dict with keys x, a, e, and y To use for training with Spektral feed the loaded pickle data to CustomDataset(data=pickled_data)
`to_pyg_graphs`([include_object_ids])
`to_pytorch_graphs`([include_object_ids])	Convert graph frames to PyTorch Geometric Data objects.
`to_spektral_graphs`([include_object_ids])

Attributes

`adjacency_matrix_connect_type`
`adjacency_matrix_type`
`chunk_size`
`dataset`
`default_edge_feature_funcs`
`default_node_feature_funcs`
`defending_team_node_value`
`engine`
`feature_opts`
`global_feature_type`
`graph_frames`
`graph_id_col`
`label_col`
`label_type`
`non_potential_receiver_node_value`
`pad`
`prediction`
`random_seed`
`return_dtypes`
`sample_rate`
`self_loop_ball`
`verbose`
`edge_feature_funcs`
`node_feature_funcs`
`global_feature_cols`
`additional_feature_cols`
`settings`

dataset: KloppyPolarsDataset = None

chunk_size: int = 20000

non_potential_receiver_node_value: float = 0.1

edge_feature_funcs: List[Callable[[Dict[str, Any]], ndarray]]

node_feature_funcs: List[Callable[[Dict[str, Any]], ndarray]]

global_feature_cols: List[str] | None

global_feature_type: Literal['ball', 'all'] = 'ball'

additional_feature_cols: List[str] | None

property default_node_feature_funcs: list

property default_edge_feature_funcs: list

get_players_by_team_id(team_id)[source]

get_player_by_id(player_id)[source]

plot(file_path, fps=None, timestamp=None, end_timestamp=None, period_id=None, team_color_a='#CD0E61', team_color_b='#0066CC', ball_color='black', sort=True, color_by='ball_owning', anonymous=False, plot_type='full', show_label=True, show_ball_label=False, show_timestamp=True, next_closest_timestamp=False)[source]

Plot tracking data as a static image or video file.

This method visualizes tracking data for players and the ball. It can generate either: - A single PNG image (if either fps or end_timestamp is None, or both are None) - An MP4 video (if both fps and end_timestamp are provided)

Parameters:

file_path (str) – The output path where the PNG or MP4 file will be saved
fps (int, optional) – Frames per second for video output. If None, a static image is generated
timestamp (pl.duration, optional) – The starting timestamp to plot. If None, starts from the beginning of available data
end_timestamp (pl.duration, optional) – The ending timestamp for video output. If None, a static image is generated
period_id (int, optional) – ID of the match period to visualize. If None, all periods are included
team_color_a (str, default "#CD0E61") – Hex color code for Team A visualization
team_color_b (str, default "#0066CC") – Hex color code for Team B visualization
ball_color (str, default "black") – Color for ball visualization
color_by (Literal[``”ball_owning”, ``"static_home_away"], default "ball_owning") – Method for coloring the teams: - “ball_owning”: Colors teams based on ball possession - “static_home_away”: Uses static colors for home and away teams
anonymous (bool, default False) – Whether to anonymize player labels
plot_type (Literal[``”pitch_only”, ``"graph_only", "full"], default "full") – Type of plot to generate: - “pitch_only”: Shows only the soccer pitch visualization - “graph_only”: Shows only the graph features (node features, adjacency matrix, edge features) - “full”: Shows both pitch and graph visualizations
show_pitch_label (bool, default True) – Whether to show the label on the pitch visualization
show_pitch_timestamp (bool, default True) – Whether to show the timestamp on the pitch visualization
next_closest_timestamp (bool, default False) – When plotting a .png and the timestamp isn’t 100% correct we find the next correct timestamp and use that to plot.
sort (bool)
show_label (bool)
show_ball_label (bool)
show_timestamp (bool)

Returns:

The function saves the output file to the specified file_path but doesn’t return any value

Return type:

None

Notes

Output file type is determined by parameters: - PNG: Generated when either fps or end_timestamp is None, or both are None - MP4: Generated when both fps and end_timestamp are provided

Raises:

ValueError – If file extension doesn’t match the parameters provided (e.g., .mp4 extension but missing fps or end_timestamp, or .png extension with both fps and end_timestamp)

Parameters:

file_path (str)
fps (int)
timestamp (duration)
end_timestamp (duration)
period_id (int)
team_color_a (str)
team_color_b (str)
ball_color (str)
sort (bool)
color_by (Literal['ball_owning', 'static_home_away'])
anonymous (bool)
plot_type (Literal['pitch_only', 'graph_only', 'full'])
show_label (bool)
show_ball_label (bool)
show_timestamp (bool)
next_closest_timestamp (bool)

__init__(engine='auto', prediction=False, self_loop_ball=False, adjacency_matrix_connect_type='ball', adjacency_matrix_type='split_by_team', label_type='binary', defending_team_node_value=0.1, random_seed=False, pad=False, verbose=False, label_col=None, graph_id_col=None, sample_rate=None, dataset=None, chunk_size=20000, non_potential_receiver_node_value=0.1, edge_feature_funcs=<factory>, node_feature_funcs=<factory>, global_feature_cols=<factory>, global_feature_type='ball', additional_feature_cols=<factory>)

Parameters:

engine (Literal['auto', 'gpu'])
prediction (bool)
self_loop_ball (bool)
adjacency_matrix_connect_type (Literal['ball', 'ball_carrier', 'no_connection'])
adjacency_matrix_type (Literal['delaunay', 'split_by_team', 'dense', 'dense_ap', 'dense_dp'])
label_type (Literal['binary'])
defending_team_node_value (float)
random_seed (bool | int)
pad (bool)
verbose (bool)
label_col (str)
graph_id_col (str)
sample_rate (float)
dataset (KloppyPolarsDataset)
chunk_size (int)
non_potential_receiver_node_value (float)
edge_feature_funcs (List[Callable[[Dict[str, Any]], ndarray]])
node_feature_funcs (List[Callable[[Dict[str, Any]], ndarray]])
global_feature_cols (List[str] | None)
global_feature_type (Literal['ball', 'all'])
additional_feature_cols (List[str] | None)

Return type:

None