Files
IPZ_1/docs/03_oltp_schema.md
2026-05-17 17:17:04 +02:00

6.4 KiB
Raw Blame History

OLTP Database

Overview

The OLTP (Online Transaction Processing) database is a normalized relational schema implemented in MySQL 8.4. It serves as the authoritative source of record for all EWC 2025 data and as the source for the ETL pipeline that populates the Data Mart.

The schema is in Third Normal Form (3NF) — no transitive dependencies, no repeating groups, every non-key attribute depends on the whole key.

The DDL is in sql/schema.sql. The database runs locally in Docker on port 13306.


Tables

Lookup Tables

These have no foreign keys and are loaded first.

game — The 25 unique game titles that appear at EWC 2025.

Column Type Notes
game_id INT UNSIGNED Auto-increment PK
name VARCHAR(100) Unique
game_type VARCHAR(50) MOBA, FPS, Battle Royale, Fighting, etc.
platform VARCHAR(50) PC, Mobile, Console/PC, etc.

country — All countries represented at the event, enriched with medal tallies from file 08.

Column Type Notes
country_id INT UNSIGNED Auto-increment PK
name VARCHAR(100) Unique
region VARCHAR(50) Asia, Europe, North America, etc.
gold_medals TINYINT UNSIGNED From file 08 (0 if not in file 08)
silver_medals TINYINT UNSIGNED
bronze_medals TINYINT UNSIGNED
total_medals TINYINT UNSIGNED
total_players SMALLINT UNSIGNED
top_game VARCHAR(100) Game the country performed best in

point_system — The Club Championship scoring table (16 rows covering placements 18 under Standard and Co-Placement rules).

prize_pool_category — The 5 high-level prize pool categories (Game Championships, Club Championship, Qualifiers, MVP Awards, Club Partner Support).


Core Entities

organization — All esports clubs and teams that appear anywhere in the data. 40 rows come from the Club Partner Program file with full metadata; the remaining organizations are inserted with NULL for partner-specific fields.

Column Notes
club_partner_status ENUM: Current / New / None
top_8_2024 Whether the club finished top-8 at EWC 2024
social_media_followers_m Total following in millions

tournament — One row per tournament event (27 total). The winner and runner_up columns are stored as plain VARCHAR rather than foreign keys to organization because individual-format games (Chess, StarCraft II, etc.) list a player name as the winner, not a team.

Column Notes
gender ENUM: Open / Men / Women
club_championship_points Whether this tournament awarded Club Championship points

schedule — A 1:1 extension of tournament holding the schedule metadata from file 07 (week number, venue, timezone, duration). Kept separate to avoid widening the tournament row.


player — 272 players from the official roster. Uses the natural key from the dataset (EWC2025_001, etc.) as the primary key rather than an auto-increment, since the source data provides stable identifiers.


medalist — One row per player-medal. A player who wins Gold contributes one row. A five-player team winning Gold contributes five rows. This is the most granular performance record in the OLTP.


match_result — 50 match records from file 10. The team_1, team_2, and winner columns are VARCHAR for the same reason as tournament.winner — individual-format games list player names here.


club_championship_standing — Final standings for the 24 clubs that earned Club Championship points. 1:1 with organization.


Junction Tables

Two multi-valued columns from the source data are normalized into junction tables:

organization_game_competing — Resolves the comma-separated Games_Competing column from the Club Partner Program (e.g. "Dota 2, Chess, EA Sports FC 25, Counter-Strike 2").

organization_game_won — Resolves the comma-separated Games_Won column from the Club Championship standings.


Entity Relationships

game ──────────────────────────────────────────────┐
  │                                                 │
  ├──► tournament ──► schedule                      │
  │         │                                       │
  │         ├──► medalist ◄── organization ◄────────┤
  │         │         └──► country                  │
  │         └──► match_result                       │
  │                                                 │
  └──► player ◄── organization                      │
             └──► country                           │
                                                    │
organization ──► club_championship_standing         │
           │                                        │
           ├──► organization_game_competing ────────┘
           └──► organization_game_won ──────────────┘

point_system          (standalone lookup)
prize_pool_category   (standalone lookup)

Design Decisions

Why is country a table rather than a VARCHAR column? Country appears in players, medalists, and organizations. Storing it as a table avoids duplicating the region attribute and allows medal stats (from file 08) to be joined in without repeating them on every player row.

Why does tournament.winner stay as VARCHAR? Enforcing a foreign key to organization would require creating dummy organization rows for individual players like "Magnus Carlsen" or "Serral". That would pollute the organization table with data that isn't an organization. The clean solution is to keep it as text and resolve it at query time when needed.

Why is schedule a separate table from tournament? A 1:1 split is justified here because the schedule data comes from a completely different source file (file 07) and is conceptually distinct — it describes the logistics of the event, not the competitive outcome. Keeping it separate makes the ETL cleaner and the tournament table less wide.

Why use player_id VARCHAR(20) instead of AUTO_INCREMENT? The source dataset provides stable IDs (EWC2025_001 through EWC2025_272). Using the natural key preserves traceability back to the source without adding a meaningless surrogate.