Core Concepts
This page explains the key concepts and terminology used in unravelsports.
Data Flow
The typical data flow in unravelsports follows these steps:
Raw Tracking Data → Loaded via Kloppy (soccer) or direct CSV (American Football)
Polars DataFrame → Fast, efficient data representation
Graph Structures → Convert to graphs for GNN training
Model Training → Train with PyTorch Geometric or Spektral
Predictions/Analytics → Apply models or compute metrics
Tracking Data
What is Tracking Data?
Tracking data captures the position of players and the ball at high frequency (typically 10-25 Hz). Each frame includes:
x, y coordinates: Position on the pitch/field
Velocities: Speed in x and y directions
Ball state: Whether the ball is in play, out of bounds, etc.
Metadata: Team IDs, player IDs, timestamps
Supported Data Providers
Soccer (via Kloppy):
Sportec (DFL Open Data)
SkillCorner
Tracab (ChyronHego)
Second Spectrum
StatsPerform
Metrica Sports
PFF / GradientSports
HawkEye
Signality
American Football:
NFL Big Data Bowl CSV files
Polars DataFrames
Why Polars?
Polars is a blazingly fast DataFrame library written in Rust with a Python API. Benefits:
Performance: 10-100x faster than pandas for many operations
Memory efficiency: Lower memory footprint
Lazy evaluation: Build query plans before execution
Modern API: Clean, consistent interface
DataFrame Structure
After conversion, the DataFrame contains:
period_id: Match period (1, 2, etc.)timestamp: Time within the periodframe_id: Unique frame identifierball_state: alive, dead, out, etc.id: Player/ball IDx, y, z: Coordinatesteam_id: Team identifierposition_name: Player position (GK, CB, etc.)vx, vy, vz: Velocity componentsax, ay, az: Acceleration componentsball_owning_team_id: Team in possessionis_ball_carrier: Whether player has the ball
Graph Neural Networks
What are Graphs?
In the context of sports tracking data, a graph represents:
Nodes: Players and the ball
Edges: Relationships between players (teammates, opponents, proximity to ball)
Node Features: Player attributes (position, velocity, acceleration, etc.)
Edge Features: Relationship attributes (distance, angle, relative velocity)
Global Features: Game-level information (score, time, etc.)
Why Use GNNs for Sports Data?
Graph Neural Networks are ideal for sports analytics because:
Permutation Invariance: Player order doesn’t matter
Relational Reasoning: Capture interactions between players
Variable Size: Handle different numbers of players on the field
Spatial Structure: Naturally model the spatial nature of sports
Graph Conversion Settings
Key parameters when converting to graphs:
adjacency_matrix_type: How to connect nodes
split_by_team: Separate connections for each teamdelaunay: Based on spatial proximity (Delaunay triangulation)dense: Fully connected graph
adjacency_matrix_connect_type: How to connect to the ball
ball: Connect all players to the ballball_carrier: Only connect ball carrier to ballno_connection: No ball connections
Node features: What information to include for each node
Edge features: What information to include for each edge
See the Soccer Graph Neural Networks tutorial and Graph FAQ for more details.
Labels and Graph IDs
Labels
For supervised learning, you need labels for each graph:
from unravel.utils import add_dummy_label_column
# Add random binary labels (for demonstration)
dataset.dataset = add_dummy_label_column(dataset.dataset)
# Or join real labels from your own data
# dataset.dataset = dataset.dataset.join(your_labels, on="some_key")
Graph IDs
Graph IDs group frames that belong to the same “sample”:
from unravel.utils import add_graph_id_column
# Each frame is a separate graph
dataset.dataset = add_graph_id_column(dataset.dataset, by=["frame_id"])
# Or group by possession
dataset.dataset = add_graph_id_column(dataset.dataset, by=["possession_id"])
Important: Always split data by graph_id to avoid data leakage!
Soccer Analytics Models
Pressing Intensity
A metric quantifying defensive pressure on ball carriers. Based on:
Defender positions relative to ball carrier
Defender velocities
Spatial coverage
See Bekkers (2024) for the mathematical formulation.
EFPI (Elastic Formation and Position Identification)
A template matching algorithm to:
Detect team formations (4-4-2, 4-3-3, etc.)
Assign tactical positions to players
Handle substitutions and formation changes
Uses linear assignment to match player positions to formation templates.
See Bekkers (2025) for details.