Files
IPZ_1/docs/02_dataset.md
2026-05-17 17:17:04 +02:00

51 lines
3.7 KiB
Markdown

# Dataset
## Source
The data comes from a Kaggle dataset titled **"Esports World Cup 2025 — Complete Dataset"**, released under the CC BY 4.0 license. It was compiled from publicly available tournament results, official brackets, and club partnership announcements.
The dataset covers all 27 title tournaments of EWC 2025, held at Boulevard City, Riyadh, Saudi Arabia from **July 8 to August 24, 2025**. The event featured a **$100M+ total prize pool** and competitors from **36+ countries**.
---
## Files
| File | Rows | Description |
|---|---|---|
| `01_EWC2025_Event_Tournament_Summary.csv` | 27 | One row per tournament — dates, prize pool, winner, game type |
| `02_EWC2025_Medalists.csv` | 257 | Every gold/silver/bronze medalist with country and role |
| `03_EWC2025_Club_Championship_Standings.csv` | 24 | Final Club Championship rankings — points and prize money |
| `04_EWC2025_Club_Partner_Program.csv` | 40 | Official partner organizations — region, founding year, social following |
| `05_EWC2025_Player_Roster.csv` | 272 | Full player roster — age, experience, prize earned, social followers |
| `06_EWC2025_Prize_Pool_Distribution.csv` | 5 | How the $100M+ prize pool was split across categories |
| `07_EWC2025_Calendar_Schedule.csv` | 27 | Weekly tournament schedule with venue and timezone |
| `08_EWC2025_Country_Results.csv` | 36 | Medal tally by country with player counts |
| `09_EWC2025_Point_System.csv` | 16 | Club Championship point system by placement |
| `10_EWC2025_Game_by_Game_Results.csv` | 50 | Match results — scores, map, duration, MVP |
---
## Key Entities in the Data
**Games** — 25 unique titles spanning 8 genres (MOBA, FPS, Battle Royale, Fighting, RTS, Sports, Auto Battler, Strategy) across PC, mobile, and console platforms.
**Organizations** — 60+ esports clubs. 40 are official EWC Club Partners with full metadata (founding year, HQ, social following). The rest appear through match results and medalist records.
**Players** — 272 players in the roster with demographics (age, country, region), performance (tournament placement, prize earned), and social media reach.
**Tournaments** — 27 events, including two with gender divisions (Mobile Legends: Bang Bang ran separate Men and Women brackets) and two with format variants (Naraka: Bladepoint ran Solo and Trios simultaneously).
**Club Championship** — A meta-competition running across all 27 tournaments. Clubs accumulate points based on their placements in each event. The top 24 clubs share a $27M prize pool.
---
## Data Quality Notes
A few things to be aware of when working with this data:
- **Game name inconsistencies across files.** Files 07, 02, and 10 use expanded names like `"Mobile Legends: Bang Bang - Men"` and `"Naraka: Bladepoint - Solo"`, while file 01 uses the base game name with a separate Gender column. The seed script handles this normalization automatically.
- **Abbreviated game names in file 04.** The `Games_Competing` column uses shorthand like `"Mobile Legends"` instead of `"Mobile Legends: Bang Bang"`, and `"PUBG"` which is ambiguous. These are resolved via an alias map in the seed script.
- **Mixed winner types in file 01.** For individual-format games (Chess, StarCraft II, Street Fighter 6, Tekken 8, EA Sports FC 25), the `Winner` column contains a player name rather than a team name. This is why `winner` and `runner_up` are stored as plain text in the OLTP rather than as foreign keys.
- **`Battlegrounds Mobile India`** appears in file 04 as a game one organization competes in, but it is not present as an EWC 2025 tournament. This entry is skipped during seeding.
- **Prize earnings for players** (file 05) are all zero in the dataset, likely because individual prize splits were not publicly available at time of compilation.