Project Methodology
About GridironIQ
GridironIQ is an NFL analytics dashboard built around an automated Python data pipeline, MySQL storage layer, PHP API endpoints, and interactive Plotly.js visualisations.
Dataset
The project uses NFL data accessed through nflreadpy, the newer nflverse Python package used to retrieve structured NFL datasets.
GridironIQ currently covers the 2015–2024 NFL seasons, providing a decade-long dataset for team, player, comparison, and game-level analysis. The final hosted database contains 219,105 loaded dashboard rows after cleaning, enrichment, validation, and loading.
- Schedules and game results
- Team game statistics
- Derived team season summaries
- Player weekly statistics
- Derived player season summaries
- Game-level performance metrics
- Validation results stored as data quality evidence
Technology Stack
Processing Pipeline
The dataset is processed through a repeatable ETL-style workflow before being loaded into MySQL.
Ingest
Pull schedule, team statistics, and player statistics data from nflreadpy into Pandas DataFrames.
Clean
Standardise column names, data types, season types, missing values, and historical team abbreviations.
Enrich
Join schedules with team statistics and derive dashboard-ready metrics such as point differential, win percentage, yardage totals, player role groups, and season summaries.
Validate
Run automated checks for required columns, missing keys, duplicate records, invalid results, impossible percentage values, and unexpected negative values.
Load
Load validated outputs into MySQL in memory-safe chunks for the PHP API layer and dashboard to consume.
The full ingestion, cleaning, enrichment, validation, and loading process is repeatable from Python. Due to shared-hosting memory limits, large database inserts are loaded in chunks, particularly for the player weekly statistics table.
Data Quality Decisions
A key cleaning issue was the use of different team abbreviations across historical NFL data. Franchise relocations and rebrands meant that older records could use identifiers such as OAK, SD, or STL, while modern datasets use LV, LAC, and LA. These abbreviations were standardised to support consistent franchise-level comparison across seasons.
Validation was also made domain-aware. Some american football metrics can be negative, such as point differential, rushing yards, receiving yards, EPA, and fantasy points. These values are preserved rather than incorrectly cleaned away.
- Required identifiers checked before loading.
- Duplicate game, team-game, and player keys checked.
- Numeric fields converted and validated.
- Result values restricted to win, loss, or tie.
- Percentage fields checked for valid ranges.
- Historical team abbreviations standardised for long-term comparison.
- Validation outputs stored as evidence in
data_quality_checks.
Visualisation Design
The dashboard is designed around exploratory analysis rather than static reporting. The Overview page provides league-wide data, the Teams page provides drill-down detail, the Players page supports player-level analysis, the Compare page supports same-season and cross-season team comparisons, and the Games page allows game-level filtering.
- Overview: league-wide rankings, KPIs, filters, and offence/defence comparison.
- Teams: details-on-demand for individual teams across regular season and postseason.
- Players: leaderboards, radar profiles, weekly trends, and heatmap exploration.
- Compare: side-by-side and cross-season team analysis.
- Games: game-level exploration by week, team, scoreline, and venue.
The dashboard adapts some visualisations based on the selected data. Bye weeks are shown as gaps in weekly line charts rather than connected values, avoiding the implication that a team played during a non-game week.
Postseason selections often contain very small samples, so selected charts switch to compact comparison views where a trend chart would be misleading.