unravel.soccer.SoccerGraphConverter

class unravel.soccer.SoccerGraphConverter[source]

Convert soccer tracking data from Polars DataFrame to graph structures for GNN training.

This class transforms soccer tracking data into graph representations suitable for Graph Neural Networks. Each frame of tracking data becomes a graph with players and the ball as nodes, with edges representing spatial relationships or team affiliations.

The converter supports two GNN frameworks: - PyTorch Geometric (recommended) via to_pytorch_graphs() - Spektral (deprecated, Python 3.11 only) via to_spektral_graphs()

Graph Structure:
  • Nodes: Players (home team, away team) and ball

  • Node Features: Position, velocity, acceleration, distances, angles (12 default features)

  • Edges: Defined by adjacency_matrix_type (team-based, spatial, or dense)

  • Edge Features: Distances, angles, relative velocities (6-7 default features)

  • Global Features: Optional match-level features attached to ball node

Key Features:
  • Configurable node and edge feature engineering

  • Multiple adjacency matrix types (split_by_team, delaunay, dense)

  • Custom feature functions via decorators

  • Automatic padding for fixed-size graphs

  • Ball connection strategies (all players, carrier only, none)

  • Permutation invariance via random node ordering

Parameters:
  • dataset (KloppyPolarsDataset) – Polars dataset with tracking data. Must have been processed with add_graph_ids() and optionally add_dummy_labels().

  • chunk_size (int, optional) – Number of graphs to process simultaneously. Higher values use more memory but may be faster. Defaults to 20000.

  • non_potential_receiver_node_value (float, optional) – Node feature value (0-1) assigned to defending team players. Used to distinguish attackers from defenders. Defaults to 0.1.

  • edge_feature_funcs (List[Callable], optional) – Custom edge feature functions decorated with @graph_feature(type="edge"). If None, uses defaults. Defaults to None.

  • node_feature_funcs (List[Callable], optional) – Custom node feature functions decorated with @graph_feature(type="node"). If None, uses defaults. Defaults to None.

  • global_feature_cols (List[str], optional) – Column names from the dataset to use as graph-level features (e.g., match score, team ratings). Must be constant within each graph_id group. Defaults to empty list.

  • global_feature_type (Literal[``”ball”, ``"all"], optional) – Where to attach global features. “ball” attaches to ball node only, “all” attaches to all nodes. Defaults to “ball”.

  • additional_feature_cols (List[str], optional) – Extra columns from dataset to make available to custom feature functions (e.g., player height, position). Defaults to empty list.

settings

Configuration for graph conversion including adjacency matrix type, padding, and feature settings.

Type:

GraphSettingsPolars

n_node_features

Total number of node features per node.

Type:

int

n_edge_features

Total number of edge features per edge.

Type:

int

n_graph_features

Total number of global/graph-level features.

Type:

int

Raises:
  • ValueError – If dataset is not a KloppyPolarsDataset.

  • ValueError – If required columns (graph_id, label) are missing.

  • ValueError – If custom feature functions are not properly decorated.

Example

>>> from unravel.soccer import KloppyPolarsDataset, SoccerGraphConverter
>>> from kloppy import sportec
>>>
>>> # Load and prepare data
>>> kloppy_dataset = sportec.load_open_tracking_data(only_alive=True)
>>> polars_dataset = KloppyPolarsDataset(kloppy_dataset=kloppy_dataset)
>>> polars_dataset.add_dummy_labels(by=["frame_id"])
>>> polars_dataset.add_graph_ids(by=["frame_id"])
>>>
>>> # Create converter
>>> converter = SoccerGraphConverter(
...     dataset=polars_dataset,
...     self_loop_ball=True,
...     adjacency_matrix_connect_type="ball",
...     adjacency_matrix_type="split_by_team",
...     label_type="binary",
... )
>>>
>>> # Convert to PyTorch Geometric format
>>> graphs = converter.to_pytorch_graphs()
>>> print(f"Created {len(graphs)} graphs")
>>> print(f"Node features: {converter.n_node_features}")
>>> print(f"Edge features: {converter.n_edge_features}")

Note

For detailed configuration options, see GraphSettingsPolars. For custom features, see graph_feature() decorator.

Warning

If not using padding (pad=False), graphs with incomplete player data (< 22 players) will be dropped. Use pad=True for variable-sized teams.

See also

KloppyPolarsDataset: Prepare tracking data. GraphDataset: Wrap graphs for training. graph_feature(): Create custom features. ../tutorials/soccer_gnn: Complete GNN training tutorial. Graph FAQ: Detailed configuration guide.

__init__(engine='auto', prediction=False, self_loop_ball=False, adjacency_matrix_connect_type='ball', adjacency_matrix_type='split_by_team', label_type='binary', defending_team_node_value=0.1, random_seed=False, pad=False, verbose=False, label_col=None, graph_id_col=None, sample_rate=None, dataset=None, chunk_size=20000, non_potential_receiver_node_value=0.1, edge_feature_funcs=<factory>, node_feature_funcs=<factory>, global_feature_cols=<factory>, global_feature_type='ball', additional_feature_cols=<factory>)
Parameters:
Return type:

None

Methods

__init__([engine, prediction, ...])

get_player_by_id(player_id)

get_players_by_team_id(team_id)

plot(file_path[, fps, timestamp, ...])

Plot tracking data as a static image or video file.

to_custom_dataset([include_object_ids])

Spektral requires a spektral Dataset to load the data for docs see https://graphneural.network/creating-dataset/

to_graph_dataset([include_object_ids])

Spektral requires a spektral Dataset to load the data for docs see https://graphneural.network/creating-dataset/

to_graph_frames([include_object_ids])

to_pickle(file_path[, verbose, ...])

We store the 'dict' version of the Graphs to pickle each graph is now a dict with keys x, a, e, and y To use for training with Spektral feed the loaded pickle data to CustomDataset(data=pickled_data)

to_pyg_graphs([include_object_ids])

to_pytorch_graphs([include_object_ids])

Convert graph frames to PyTorch Geometric Data objects.

to_spektral_graphs([include_object_ids])

Attributes

adjacency_matrix_connect_type

adjacency_matrix_type

chunk_size

dataset

default_edge_feature_funcs

default_node_feature_funcs

defending_team_node_value

engine

feature_opts

global_feature_type

graph_frames

graph_id_col

label_col

label_type

non_potential_receiver_node_value

pad

prediction

random_seed

return_dtypes

sample_rate

self_loop_ball

verbose

edge_feature_funcs

node_feature_funcs

global_feature_cols

additional_feature_cols

settings

dataset: KloppyPolarsDataset = None
chunk_size: int = 20000
non_potential_receiver_node_value: float = 0.1
edge_feature_funcs: List[Callable[[Dict[str, Any]], ndarray]]
node_feature_funcs: List[Callable[[Dict[str, Any]], ndarray]]
global_feature_cols: List[str] | None
global_feature_type: Literal['ball', 'all'] = 'ball'
additional_feature_cols: List[str] | None
property default_node_feature_funcs: list
property default_edge_feature_funcs: list
get_players_by_team_id(team_id)[source]
get_player_by_id(player_id)[source]
plot(file_path, fps=None, timestamp=None, end_timestamp=None, period_id=None, team_color_a='#CD0E61', team_color_b='#0066CC', ball_color='black', sort=True, color_by='ball_owning', anonymous=False, plot_type='full', show_label=True, show_ball_label=False, show_timestamp=True, next_closest_timestamp=False)[source]

Plot tracking data as a static image or video file.

This method visualizes tracking data for players and the ball. It can generate either: - A single PNG image (if either fps or end_timestamp is None, or both are None) - An MP4 video (if both fps and end_timestamp are provided)

Parameters:
  • file_path (str) – The output path where the PNG or MP4 file will be saved

  • fps (int, optional) – Frames per second for video output. If None, a static image is generated

  • timestamp (pl.duration, optional) – The starting timestamp to plot. If None, starts from the beginning of available data

  • end_timestamp (pl.duration, optional) – The ending timestamp for video output. If None, a static image is generated

  • period_id (int, optional) – ID of the match period to visualize. If None, all periods are included

  • team_color_a (str, default "#CD0E61") – Hex color code for Team A visualization

  • team_color_b (str, default "#0066CC") – Hex color code for Team B visualization

  • ball_color (str, default "black") – Color for ball visualization

  • color_by (Literal[``”ball_owning”, ``"static_home_away"], default "ball_owning") – Method for coloring the teams: - “ball_owning”: Colors teams based on ball possession - “static_home_away”: Uses static colors for home and away teams

  • anonymous (bool, default False) – Whether to anonymize player labels

  • plot_type (Literal[``”pitch_only”, ``"graph_only", "full"], default "full") – Type of plot to generate: - “pitch_only”: Shows only the soccer pitch visualization - “graph_only”: Shows only the graph features (node features, adjacency matrix, edge features) - “full”: Shows both pitch and graph visualizations

  • show_pitch_label (bool, default True) – Whether to show the label on the pitch visualization

  • show_pitch_timestamp (bool, default True) – Whether to show the timestamp on the pitch visualization

  • next_closest_timestamp (bool, default False) – When plotting a .png and the timestamp isn’t 100% correct we find the next correct timestamp and use that to plot.

  • sort (bool)

  • show_label (bool)

  • show_ball_label (bool)

  • show_timestamp (bool)

Returns:

The function saves the output file to the specified file_path but doesn’t return any value

Return type:

None

Notes

Output file type is determined by parameters: - PNG: Generated when either fps or end_timestamp is None, or both are None - MP4: Generated when both fps and end_timestamp are provided

Raises:

ValueError – If file extension doesn’t match the parameters provided (e.g., .mp4 extension but missing fps or end_timestamp, or .png extension with both fps and end_timestamp)

Parameters:
  • file_path (str)

  • fps (int)

  • timestamp (duration)

  • end_timestamp (duration)

  • period_id (int)

  • team_color_a (str)

  • team_color_b (str)

  • ball_color (str)

  • sort (bool)

  • color_by (Literal['ball_owning', 'static_home_away'])

  • anonymous (bool)

  • plot_type (Literal['pitch_only', 'graph_only', 'full'])

  • show_label (bool)

  • show_ball_label (bool)

  • show_timestamp (bool)

  • next_closest_timestamp (bool)

__init__(engine='auto', prediction=False, self_loop_ball=False, adjacency_matrix_connect_type='ball', adjacency_matrix_type='split_by_team', label_type='binary', defending_team_node_value=0.1, random_seed=False, pad=False, verbose=False, label_col=None, graph_id_col=None, sample_rate=None, dataset=None, chunk_size=20000, non_potential_receiver_node_value=0.1, edge_feature_funcs=<factory>, node_feature_funcs=<factory>, global_feature_cols=<factory>, global_feature_type='ball', additional_feature_cols=<factory>)
Parameters:
Return type:

None