Files
IPZ_1/docs/03_oltp_schema.md
2026-05-17 17:17:04 +02:00

141 lines
6.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# OLTP Database
## Overview
The OLTP (Online Transaction Processing) database is a normalized relational schema implemented in **MySQL 8.4**. It serves as the authoritative source of record for all EWC 2025 data and as the source for the ETL pipeline that populates the Data Mart.
The schema is in **Third Normal Form (3NF)** — no transitive dependencies, no repeating groups, every non-key attribute depends on the whole key.
The DDL is in `sql/schema.sql`. The database runs locally in Docker on port **13306**.
---
## Tables
### Lookup Tables
These have no foreign keys and are loaded first.
**`game`** — The 25 unique game titles that appear at EWC 2025.
| Column | Type | Notes |
|---|---|---|
| game_id | INT UNSIGNED | Auto-increment PK |
| name | VARCHAR(100) | Unique |
| game_type | VARCHAR(50) | MOBA, FPS, Battle Royale, Fighting, etc. |
| platform | VARCHAR(50) | PC, Mobile, Console/PC, etc. |
---
**`country`** — All countries represented at the event, enriched with medal tallies from file 08.
| Column | Type | Notes |
|---|---|---|
| country_id | INT UNSIGNED | Auto-increment PK |
| name | VARCHAR(100) | Unique |
| region | VARCHAR(50) | Asia, Europe, North America, etc. |
| gold_medals | TINYINT UNSIGNED | From file 08 (0 if not in file 08) |
| silver_medals | TINYINT UNSIGNED | |
| bronze_medals | TINYINT UNSIGNED | |
| total_medals | TINYINT UNSIGNED | |
| total_players | SMALLINT UNSIGNED | |
| top_game | VARCHAR(100) | Game the country performed best in |
---
**`point_system`** — The Club Championship scoring table (16 rows covering placements 18 under Standard and Co-Placement rules).
**`prize_pool_category`** — The 5 high-level prize pool categories (Game Championships, Club Championship, Qualifiers, MVP Awards, Club Partner Support).
---
### Core Entities
**`organization`** — All esports clubs and teams that appear anywhere in the data. 40 rows come from the Club Partner Program file with full metadata; the remaining organizations are inserted with NULL for partner-specific fields.
| Column | Notes |
|---|---|
| club_partner_status | ENUM: Current / New / None |
| top_8_2024 | Whether the club finished top-8 at EWC 2024 |
| social_media_followers_m | Total following in millions |
---
**`tournament`** — One row per tournament event (27 total). The `winner` and `runner_up` columns are stored as plain VARCHAR rather than foreign keys to `organization` because individual-format games (Chess, StarCraft II, etc.) list a player name as the winner, not a team.
| Column | Notes |
|---|---|
| gender | ENUM: Open / Men / Women |
| club_championship_points | Whether this tournament awarded Club Championship points |
---
**`schedule`** — A 1:1 extension of `tournament` holding the schedule metadata from file 07 (week number, venue, timezone, duration). Kept separate to avoid widening the tournament row.
---
**`player`** — 272 players from the official roster. Uses the natural key from the dataset (`EWC2025_001`, etc.) as the primary key rather than an auto-increment, since the source data provides stable identifiers.
---
**`medalist`** — One row per player-medal. A player who wins Gold contributes one row. A five-player team winning Gold contributes five rows. This is the most granular performance record in the OLTP.
---
**`match_result`** — 50 match records from file 10. The `team_1`, `team_2`, and `winner` columns are VARCHAR for the same reason as `tournament.winner` — individual-format games list player names here.
---
**`club_championship_standing`** — Final standings for the 24 clubs that earned Club Championship points. 1:1 with `organization`.
---
### Junction Tables
Two multi-valued columns from the source data are normalized into junction tables:
**`organization_game_competing`** — Resolves the comma-separated `Games_Competing` column from the Club Partner Program (e.g. `"Dota 2, Chess, EA Sports FC 25, Counter-Strike 2"`).
**`organization_game_won`** — Resolves the comma-separated `Games_Won` column from the Club Championship standings.
---
## Entity Relationships
```
game ──────────────────────────────────────────────┐
│ │
├──► tournament ──► schedule │
│ │ │
│ ├──► medalist ◄── organization ◄────────┤
│ │ └──► country │
│ └──► match_result │
│ │
└──► player ◄── organization │
└──► country │
organization ──► club_championship_standing │
│ │
├──► organization_game_competing ────────┘
└──► organization_game_won ──────────────┘
point_system (standalone lookup)
prize_pool_category (standalone lookup)
```
---
## Design Decisions
**Why is `country` a table rather than a VARCHAR column?**
Country appears in players, medalists, and organizations. Storing it as a table avoids duplicating the region attribute and allows medal stats (from file 08) to be joined in without repeating them on every player row.
**Why does `tournament.winner` stay as VARCHAR?**
Enforcing a foreign key to `organization` would require creating dummy organization rows for individual players like "Magnus Carlsen" or "Serral". That would pollute the organization table with data that isn't an organization. The clean solution is to keep it as text and resolve it at query time when needed.
**Why is `schedule` a separate table from `tournament`?**
A 1:1 split is justified here because the schedule data comes from a completely different source file (file 07) and is conceptually distinct — it describes the logistics of the event, not the competitive outcome. Keeping it separate makes the ETL cleaner and the tournament table less wide.
**Why use `player_id VARCHAR(20)` instead of AUTO_INCREMENT?**
The source dataset provides stable IDs (`EWC2025_001` through `EWC2025_272`). Using the natural key preserves traceability back to the source without adding a meaningless surrogate.