Docs init
This commit is contained in:
198
docs/04_datamart.md
Normal file
198
docs/04_datamart.md
Normal file
@@ -0,0 +1,198 @@
|
||||
# Data Mart
|
||||
|
||||
## What a Data Mart Is
|
||||
|
||||
A Data Mart is a database optimized for reading and analysis rather than for recording transactions. While the OLTP schema is normalized to avoid redundancy, the Data Mart is deliberately denormalized into a **star schema** — a central fact table surrounded by dimension tables — so that analytical queries are fast and simple to write.
|
||||
|
||||
In a star schema:
|
||||
- **Fact tables** hold measurable events with numeric metrics (prize money, medal count, points)
|
||||
- **Dimension tables** hold descriptive context that you slice and filter by (game type, country, organization region)
|
||||
|
||||
The Data Mart is stored in the **Oracle university lab schema** and populated by Apache NiFi reading from the MySQL OLTP.
|
||||
|
||||
The DDL is in `sql/datamart_schema.sql`.
|
||||
|
||||
---
|
||||
|
||||
## Dimensions
|
||||
|
||||
### DIM_DATE
|
||||
|
||||
A standard calendar dimension covering every date in the EWC 2025 event window (July 8 – August 24, 2025). Using a dedicated date dimension allows Power BI to filter by week, group by month, or compare by quarter with no extra calculation.
|
||||
|
||||
| Column | Example |
|
||||
|---|---|
|
||||
| date_key | 20250708 (YYYYMMDD integer) |
|
||||
| full_date | 2025-07-08 |
|
||||
| year | 2025 |
|
||||
| quarter | 3 |
|
||||
| month / month_name | 7 / July |
|
||||
| week_number | 28 |
|
||||
| day_of_month / day_name | 8 / Tuesday |
|
||||
|
||||
---
|
||||
|
||||
### DIM_GAME
|
||||
|
||||
Describes each of the 25 game titles. Enables slicing facts by genre (MOBA vs FPS vs Battle Royale) and by platform (PC vs Mobile vs Console).
|
||||
|
||||
| Column | Example |
|
||||
|---|---|
|
||||
| name | Counter-Strike 2 |
|
||||
| game_type | FPS |
|
||||
| platform | PC |
|
||||
|
||||
---
|
||||
|
||||
### DIM_COUNTRY
|
||||
|
||||
Countries with their geographic region. Intentionally kept lean — the medal counts that live in the OLTP `country` table are not carried into this dimension because they are derived facts, not descriptive attributes.
|
||||
|
||||
| Column | Example |
|
||||
|---|---|
|
||||
| name | South Korea |
|
||||
| region | Asia |
|
||||
|
||||
---
|
||||
|
||||
### DIM_ORGANIZATION
|
||||
|
||||
All esports clubs and teams. Includes partner metadata to enable analysis by partner tier (Current partner vs non-partner) and social reach.
|
||||
|
||||
| Column | Example |
|
||||
|---|---|
|
||||
| name | Team Falcons |
|
||||
| region | Middle East |
|
||||
| country | Saudi Arabia |
|
||||
| club_partner_status | Current |
|
||||
| founded_year | 2017 |
|
||||
| social_media_followers_m | 4.0 |
|
||||
|
||||
---
|
||||
|
||||
### DIM_MEDAL
|
||||
|
||||
A simple three-row table representing the medal types. Includes `medal_rank` (1/2/3) so reports can sort Gold → Silver → Bronze correctly without relying on alphabetical ordering.
|
||||
|
||||
| medal_type | medal_rank |
|
||||
|---|---|
|
||||
| Gold | 1 |
|
||||
| Silver | 2 |
|
||||
| Bronze | 3 |
|
||||
|
||||
---
|
||||
|
||||
## Fact Tables
|
||||
|
||||
### FACT_TOURNAMENT
|
||||
|
||||
**Grain:** one row per tournament (27 rows).
|
||||
|
||||
This is the primary financial fact table. It answers questions about prize money distribution across games, genres, platforms, and time.
|
||||
|
||||
| Column | Type | Description |
|
||||
|---|---|---|
|
||||
| game_key | FK → DIM_GAME | What game |
|
||||
| start_date_key | FK → DIM_DATE | When it started |
|
||||
| end_date_key | FK → DIM_DATE | When it ended |
|
||||
| winner_org_key | FK → DIM_ORGANIZATION | Winning organization (NULL for individual-winner events) |
|
||||
| event_name | text | Degenerate dimension |
|
||||
| gender | text | Open / Men / Women |
|
||||
| **prize_pool_usd** | measure | Total prize pool in USD |
|
||||
| **num_participants** | measure | Number of competing teams/players |
|
||||
| **duration_days** | measure | Tournament length in days |
|
||||
| **has_club_points** | measure | 1 if tournament awarded Club Championship points |
|
||||
|
||||
**Example questions this enables:**
|
||||
- What was the total prize money awarded to MOBA tournaments vs FPS tournaments?
|
||||
- Which platform (PC or Mobile) had higher average prize pools?
|
||||
- How did prize pools vary across the 6-week event?
|
||||
|
||||
---
|
||||
|
||||
### FACT_MEDAL_AWARD
|
||||
|
||||
**Grain:** one row per player-medal (257 rows).
|
||||
|
||||
This fact table captures individual competitive performance. Each medalist player contributes one row with a `medal_count` of 1 and a `medal_points` of 3/2/1. Both columns are additive — you can SUM them freely to get team medal totals, country medal totals, etc.
|
||||
|
||||
| Column | Type | Description |
|
||||
|---|---|---|
|
||||
| game_key | FK → DIM_GAME | Game the medal was won in |
|
||||
| medal_key | FK → DIM_MEDAL | Gold / Silver / Bronze |
|
||||
| country_key | FK → DIM_COUNTRY | Player's nationality |
|
||||
| org_key | FK → DIM_ORGANIZATION | Player's team |
|
||||
| date_key | FK → DIM_DATE | Tournament start date |
|
||||
| player_name | text | Degenerate dimension |
|
||||
| **medal_count** | measure | Always 1 — additive for totals |
|
||||
| **medal_points** | measure | Gold=3, Silver=2, Bronze=1 |
|
||||
|
||||
**Example questions this enables:**
|
||||
- Which country won the most medals overall? By region?
|
||||
- Which game genre produced the most medals for Asian countries?
|
||||
- Which organization accumulated the most medal points across all events?
|
||||
- Did South Korea dominate PC games while Southeast Asia dominated mobile games?
|
||||
|
||||
---
|
||||
|
||||
### FACT_CLUB_STANDING
|
||||
|
||||
**Grain:** one row per club in the Club Championship (24 rows). This is a snapshot — it represents the final standings at the end of EWC 2025.
|
||||
|
||||
| Column | Type | Description |
|
||||
|---|---|---|
|
||||
| org_key | FK → DIM_ORGANIZATION | The club |
|
||||
| **final_rank** | measure | Final position (1 = best) |
|
||||
| **total_points** | measure | Total Club Championship points earned |
|
||||
| **prize_money_usd** | measure | Prize money from Club Championship |
|
||||
| **tournament_wins** | measure | Number of tournaments the club won |
|
||||
| **top_8_finishes** | measure | Total top-8 tournament finishes |
|
||||
| **eligible_to_win** | measure | 1 if the club was eligible for the grand prize |
|
||||
|
||||
**Example questions this enables:**
|
||||
- How does prize money correlate with tournament wins vs breadth of top-8 finishes?
|
||||
- Do Middle Eastern clubs outperform European clubs in the Club Championship?
|
||||
- What is the average total_points for Current club partners vs non-partners?
|
||||
|
||||
---
|
||||
|
||||
## Star Schema Diagram
|
||||
|
||||
```
|
||||
DIM_DATE
|
||||
┌──────────┐
|
||||
│ date_key │
|
||||
└────┬─────┘
|
||||
│ start/end
|
||||
│
|
||||
DIM_GAME ────────── FACT_TOURNAMENT ────────── DIM_ORGANIZATION
|
||||
(game_key) (prize_pool_usd (org_key)
|
||||
num_participants
|
||||
duration_days
|
||||
has_club_points)
|
||||
|
||||
|
||||
DIM_COUNTRY ──┐
|
||||
DIM_ORGAN. ──┼── FACT_MEDAL_AWARD ──── DIM_GAME
|
||||
DIM_MEDAL ──┘ (medal_count (game_key)
|
||||
DIM_DATE ─────┘ medal_points)
|
||||
|
||||
|
||||
DIM_ORGANIZATION ── FACT_CLUB_STANDING
|
||||
(total_points
|
||||
prize_money_usd
|
||||
tournament_wins
|
||||
top_8_finishes)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Why Three Fact Tables
|
||||
|
||||
A single fact table would require choosing one grain, which would make some analyses awkward or impossible.
|
||||
|
||||
- `FACT_TOURNAMENT` is at tournament grain — you cannot get per-player medal counts from it.
|
||||
- `FACT_MEDAL_AWARD` is at player-medal grain — you cannot get prize pool totals from it without denormalizing tournament data into it.
|
||||
- `FACT_CLUB_STANDING` captures a snapshot that has no natural place in the other two tables.
|
||||
|
||||
Keeping them separate means each fact table has a clean, single grain. Power BI can build relationships between them through the shared dimensions.
|
||||
Reference in New Issue
Block a user