American Football (NFL)
This tutorial covers how to work with NFL tracking data from the Big Data Bowl using the unravelsports package.
The unravelsports package supports NFL tracking data from the Big Data Bowl competitions, allowing you to:
Load and process NFL tracking data
Convert plays to graph structures
Train Graph Neural Networks for play prediction
Analyze player movements and formations
Interactive Notebook
A comprehensive Jupyter notebook walks through the entire process:
-
Loading Big Data Bowl CSV files
Converting to graphs
Training GNN models
Making predictions
Data Format
Big Data Bowl Data
The Big Data Bowl provides three main CSV files:
tracking_week*.csv: Player and ball tracking data
gameId: Unique game identifierplayId: Unique play identifiernflId: Player identifierframeId: Frame numberx, y: Position coordinatess: Speeda: Accelerationdis: Distance traveledo: Orientation angledir: Direction of travel
players.csv: Player information
nflId: Player identifierheight: Player heightweight: Player weightposition: Player position (QB, RB, WR, etc.)
plays.csv: Play-level information
gameId,playId: Identifiersquarter: Quarter numberdown,yardsToGo: Down and distancepossessionTeam: Team with possessionoffenseFormation: Formation namedefendersInTheBox: Number of box defenders(and many more columns)
Basic Usage
Step 1: Load Data
Load the Big Data Bowl CSV files:
from unravel.american_football import BigDataBowlDataset
# Load data
bdb_dataset = BigDataBowlDataset(
tracking_file_path="tracking_week_1.csv",
players_file_path="players.csv",
plays_file_path="plays.csv",
)
# View the data
print(bdb_dataset.dataset.head())
The resulting Polars DataFrame includes all tracking data merged with player and play information.
Step 2: Add Labels and Graph IDs
For supervised learning, add labels and graph IDs:
from unravel.utils import add_dummy_label_column, add_graph_id_column
# Add labels (use your own labels for real tasks)
bdb_dataset.dataset = add_dummy_label_column(bdb_dataset.dataset)
# Create graph ID for each play
bdb_dataset.dataset = add_graph_id_column(
bdb_dataset.dataset,
by=["gameId", "playId"]
)
Step 3: Convert to Graphs
Convert tracking data to graph structures:
from unravel.american_football import AmericanFootballGraphConverter
converter = AmericanFootballGraphConverter(
dataset=bdb_dataset,
self_loop_ball=True,
adjacency_matrix_connect_type="ball",
adjacency_matrix_type="split_by_team",
label_type="binary",
)
# Convert to PyTorch Geometric graphs
graphs = converter.to_pytorch_graphs()
Step 4: Train a Model
Train a Graph Neural Network:
from unravel.utils import GraphDataset
from unravel.classifiers import PyGLightningCrystalGraphClassifier
import pytorch_lightning as pyl
from torch_geometric.loader import DataLoader
# Create dataset and split
dataset = GraphDataset(graphs=graphs, format="pyg")
train, test, val = dataset.split_test_train_validation(4, 1, 1)
# Create data loaders
train_loader = DataLoader(train, batch_size=32, shuffle=True)
val_loader = DataLoader(val, batch_size=32)
test_loader = DataLoader(test, batch_size=32)
# Initialize and train model
model = PyGLightningCrystalGraphClassifier(
node_features=converter.n_node_features,
edge_features=converter.n_edge_features,
global_features=converter.n_graph_features,
)
trainer = pyl.Trainer(max_epochs=10)
trainer.fit(model, train_loader, val_loader)
trainer.test(model, test_loader)
Data Availability
Big Data Bowl data is released annually for Kaggle competitions:
Previous years’ data available for download
Includes selected weeks from NFL season
Requires Kaggle account (free)