Docs init

This commit is contained in:
2026-05-17 17:17:04 +02:00
parent 293b6f0266
commit 22b2289933
7 changed files with 812 additions and 0 deletions

198
docs/04_datamart.md Normal file
View File

@@ -0,0 +1,198 @@
# Data Mart
## What a Data Mart Is
A Data Mart is a database optimized for reading and analysis rather than for recording transactions. While the OLTP schema is normalized to avoid redundancy, the Data Mart is deliberately denormalized into a **star schema** — a central fact table surrounded by dimension tables — so that analytical queries are fast and simple to write.
In a star schema:
- **Fact tables** hold measurable events with numeric metrics (prize money, medal count, points)
- **Dimension tables** hold descriptive context that you slice and filter by (game type, country, organization region)
The Data Mart is stored in the **Oracle university lab schema** and populated by Apache NiFi reading from the MySQL OLTP.
The DDL is in `sql/datamart_schema.sql`.
---
## Dimensions
### DIM_DATE
A standard calendar dimension covering every date in the EWC 2025 event window (July 8 August 24, 2025). Using a dedicated date dimension allows Power BI to filter by week, group by month, or compare by quarter with no extra calculation.
| Column | Example |
|---|---|
| date_key | 20250708 (YYYYMMDD integer) |
| full_date | 2025-07-08 |
| year | 2025 |
| quarter | 3 |
| month / month_name | 7 / July |
| week_number | 28 |
| day_of_month / day_name | 8 / Tuesday |
---
### DIM_GAME
Describes each of the 25 game titles. Enables slicing facts by genre (MOBA vs FPS vs Battle Royale) and by platform (PC vs Mobile vs Console).
| Column | Example |
|---|---|
| name | Counter-Strike 2 |
| game_type | FPS |
| platform | PC |
---
### DIM_COUNTRY
Countries with their geographic region. Intentionally kept lean — the medal counts that live in the OLTP `country` table are not carried into this dimension because they are derived facts, not descriptive attributes.
| Column | Example |
|---|---|
| name | South Korea |
| region | Asia |
---
### DIM_ORGANIZATION
All esports clubs and teams. Includes partner metadata to enable analysis by partner tier (Current partner vs non-partner) and social reach.
| Column | Example |
|---|---|
| name | Team Falcons |
| region | Middle East |
| country | Saudi Arabia |
| club_partner_status | Current |
| founded_year | 2017 |
| social_media_followers_m | 4.0 |
---
### DIM_MEDAL
A simple three-row table representing the medal types. Includes `medal_rank` (1/2/3) so reports can sort Gold → Silver → Bronze correctly without relying on alphabetical ordering.
| medal_type | medal_rank |
|---|---|
| Gold | 1 |
| Silver | 2 |
| Bronze | 3 |
---
## Fact Tables
### FACT_TOURNAMENT
**Grain:** one row per tournament (27 rows).
This is the primary financial fact table. It answers questions about prize money distribution across games, genres, platforms, and time.
| Column | Type | Description |
|---|---|---|
| game_key | FK → DIM_GAME | What game |
| start_date_key | FK → DIM_DATE | When it started |
| end_date_key | FK → DIM_DATE | When it ended |
| winner_org_key | FK → DIM_ORGANIZATION | Winning organization (NULL for individual-winner events) |
| event_name | text | Degenerate dimension |
| gender | text | Open / Men / Women |
| **prize_pool_usd** | measure | Total prize pool in USD |
| **num_participants** | measure | Number of competing teams/players |
| **duration_days** | measure | Tournament length in days |
| **has_club_points** | measure | 1 if tournament awarded Club Championship points |
**Example questions this enables:**
- What was the total prize money awarded to MOBA tournaments vs FPS tournaments?
- Which platform (PC or Mobile) had higher average prize pools?
- How did prize pools vary across the 6-week event?
---
### FACT_MEDAL_AWARD
**Grain:** one row per player-medal (257 rows).
This fact table captures individual competitive performance. Each medalist player contributes one row with a `medal_count` of 1 and a `medal_points` of 3/2/1. Both columns are additive — you can SUM them freely to get team medal totals, country medal totals, etc.
| Column | Type | Description |
|---|---|---|
| game_key | FK → DIM_GAME | Game the medal was won in |
| medal_key | FK → DIM_MEDAL | Gold / Silver / Bronze |
| country_key | FK → DIM_COUNTRY | Player's nationality |
| org_key | FK → DIM_ORGANIZATION | Player's team |
| date_key | FK → DIM_DATE | Tournament start date |
| player_name | text | Degenerate dimension |
| **medal_count** | measure | Always 1 — additive for totals |
| **medal_points** | measure | Gold=3, Silver=2, Bronze=1 |
**Example questions this enables:**
- Which country won the most medals overall? By region?
- Which game genre produced the most medals for Asian countries?
- Which organization accumulated the most medal points across all events?
- Did South Korea dominate PC games while Southeast Asia dominated mobile games?
---
### FACT_CLUB_STANDING
**Grain:** one row per club in the Club Championship (24 rows). This is a snapshot — it represents the final standings at the end of EWC 2025.
| Column | Type | Description |
|---|---|---|
| org_key | FK → DIM_ORGANIZATION | The club |
| **final_rank** | measure | Final position (1 = best) |
| **total_points** | measure | Total Club Championship points earned |
| **prize_money_usd** | measure | Prize money from Club Championship |
| **tournament_wins** | measure | Number of tournaments the club won |
| **top_8_finishes** | measure | Total top-8 tournament finishes |
| **eligible_to_win** | measure | 1 if the club was eligible for the grand prize |
**Example questions this enables:**
- How does prize money correlate with tournament wins vs breadth of top-8 finishes?
- Do Middle Eastern clubs outperform European clubs in the Club Championship?
- What is the average total_points for Current club partners vs non-partners?
---
## Star Schema Diagram
```
DIM_DATE
┌──────────┐
│ date_key │
└────┬─────┘
│ start/end
DIM_GAME ────────── FACT_TOURNAMENT ────────── DIM_ORGANIZATION
(game_key) (prize_pool_usd (org_key)
num_participants
duration_days
has_club_points)
DIM_COUNTRY ──┐
DIM_ORGAN. ──┼── FACT_MEDAL_AWARD ──── DIM_GAME
DIM_MEDAL ──┘ (medal_count (game_key)
DIM_DATE ─────┘ medal_points)
DIM_ORGANIZATION ── FACT_CLUB_STANDING
(total_points
prize_money_usd
tournament_wins
top_8_finishes)
```
---
## Why Three Fact Tables
A single fact table would require choosing one grain, which would make some analyses awkward or impossible.
- `FACT_TOURNAMENT` is at tournament grain — you cannot get per-player medal counts from it.
- `FACT_MEDAL_AWARD` is at player-medal grain — you cannot get prize pool totals from it without denormalizing tournament data into it.
- `FACT_CLUB_STANDING` captures a snapshot that has no natural place in the other two tables.
Keeping them separate means each fact table has a clean, single grain. Power BI can build relationships between them through the shared dimensions.