199 lines
7.2 KiB
Markdown
199 lines
7.2 KiB
Markdown
# Data Mart
|
||
|
||
## What a Data Mart Is
|
||
|
||
A Data Mart is a database optimized for reading and analysis rather than for recording transactions. While the OLTP schema is normalized to avoid redundancy, the Data Mart is deliberately denormalized into a **star schema** — a central fact table surrounded by dimension tables — so that analytical queries are fast and simple to write.
|
||
|
||
In a star schema:
|
||
- **Fact tables** hold measurable events with numeric metrics (prize money, medal count, points)
|
||
- **Dimension tables** hold descriptive context that you slice and filter by (game type, country, organization region)
|
||
|
||
The Data Mart is stored in the **Oracle university lab schema** and populated by Apache NiFi reading from the MySQL OLTP.
|
||
|
||
The DDL is in `sql/datamart_schema.sql`.
|
||
|
||
---
|
||
|
||
## Dimensions
|
||
|
||
### DIM_DATE
|
||
|
||
A standard calendar dimension covering every date in the EWC 2025 event window (July 8 – August 24, 2025). Using a dedicated date dimension allows Power BI to filter by week, group by month, or compare by quarter with no extra calculation.
|
||
|
||
| Column | Example |
|
||
|---|---|
|
||
| date_key | 20250708 (YYYYMMDD integer) |
|
||
| full_date | 2025-07-08 |
|
||
| year | 2025 |
|
||
| quarter | 3 |
|
||
| month / month_name | 7 / July |
|
||
| week_number | 28 |
|
||
| day_of_month / day_name | 8 / Tuesday |
|
||
|
||
---
|
||
|
||
### DIM_GAME
|
||
|
||
Describes each of the 25 game titles. Enables slicing facts by genre (MOBA vs FPS vs Battle Royale) and by platform (PC vs Mobile vs Console).
|
||
|
||
| Column | Example |
|
||
|---|---|
|
||
| name | Counter-Strike 2 |
|
||
| game_type | FPS |
|
||
| platform | PC |
|
||
|
||
---
|
||
|
||
### DIM_COUNTRY
|
||
|
||
Countries with their geographic region. Intentionally kept lean — the medal counts that live in the OLTP `country` table are not carried into this dimension because they are derived facts, not descriptive attributes.
|
||
|
||
| Column | Example |
|
||
|---|---|
|
||
| name | South Korea |
|
||
| region | Asia |
|
||
|
||
---
|
||
|
||
### DIM_ORGANIZATION
|
||
|
||
All esports clubs and teams. Includes partner metadata to enable analysis by partner tier (Current partner vs non-partner) and social reach.
|
||
|
||
| Column | Example |
|
||
|---|---|
|
||
| name | Team Falcons |
|
||
| region | Middle East |
|
||
| country | Saudi Arabia |
|
||
| club_partner_status | Current |
|
||
| founded_year | 2017 |
|
||
| social_media_followers_m | 4.0 |
|
||
|
||
---
|
||
|
||
### DIM_MEDAL
|
||
|
||
A simple three-row table representing the medal types. Includes `medal_rank` (1/2/3) so reports can sort Gold → Silver → Bronze correctly without relying on alphabetical ordering.
|
||
|
||
| medal_type | medal_rank |
|
||
|---|---|
|
||
| Gold | 1 |
|
||
| Silver | 2 |
|
||
| Bronze | 3 |
|
||
|
||
---
|
||
|
||
## Fact Tables
|
||
|
||
### FACT_TOURNAMENT
|
||
|
||
**Grain:** one row per tournament (27 rows).
|
||
|
||
This is the primary financial fact table. It answers questions about prize money distribution across games, genres, platforms, and time.
|
||
|
||
| Column | Type | Description |
|
||
|---|---|---|
|
||
| game_key | FK → DIM_GAME | What game |
|
||
| start_date_key | FK → DIM_DATE | When it started |
|
||
| end_date_key | FK → DIM_DATE | When it ended |
|
||
| winner_org_key | FK → DIM_ORGANIZATION | Winning organization (NULL for individual-winner events) |
|
||
| event_name | text | Degenerate dimension |
|
||
| gender | text | Open / Men / Women |
|
||
| **prize_pool_usd** | measure | Total prize pool in USD |
|
||
| **num_participants** | measure | Number of competing teams/players |
|
||
| **duration_days** | measure | Tournament length in days |
|
||
| **has_club_points** | measure | 1 if tournament awarded Club Championship points |
|
||
|
||
**Example questions this enables:**
|
||
- What was the total prize money awarded to MOBA tournaments vs FPS tournaments?
|
||
- Which platform (PC or Mobile) had higher average prize pools?
|
||
- How did prize pools vary across the 6-week event?
|
||
|
||
---
|
||
|
||
### FACT_MEDAL_AWARD
|
||
|
||
**Grain:** one row per player-medal (257 rows).
|
||
|
||
This fact table captures individual competitive performance. Each medalist player contributes one row with a `medal_count` of 1 and a `medal_points` of 3/2/1. Both columns are additive — you can SUM them freely to get team medal totals, country medal totals, etc.
|
||
|
||
| Column | Type | Description |
|
||
|---|---|---|
|
||
| game_key | FK → DIM_GAME | Game the medal was won in |
|
||
| medal_key | FK → DIM_MEDAL | Gold / Silver / Bronze |
|
||
| country_key | FK → DIM_COUNTRY | Player's nationality |
|
||
| org_key | FK → DIM_ORGANIZATION | Player's team |
|
||
| date_key | FK → DIM_DATE | Tournament start date |
|
||
| player_name | text | Degenerate dimension |
|
||
| **medal_count** | measure | Always 1 — additive for totals |
|
||
| **medal_points** | measure | Gold=3, Silver=2, Bronze=1 |
|
||
|
||
**Example questions this enables:**
|
||
- Which country won the most medals overall? By region?
|
||
- Which game genre produced the most medals for Asian countries?
|
||
- Which organization accumulated the most medal points across all events?
|
||
- Did South Korea dominate PC games while Southeast Asia dominated mobile games?
|
||
|
||
---
|
||
|
||
### FACT_CLUB_STANDING
|
||
|
||
**Grain:** one row per club in the Club Championship (24 rows). This is a snapshot — it represents the final standings at the end of EWC 2025.
|
||
|
||
| Column | Type | Description |
|
||
|---|---|---|
|
||
| org_key | FK → DIM_ORGANIZATION | The club |
|
||
| **final_rank** | measure | Final position (1 = best) |
|
||
| **total_points** | measure | Total Club Championship points earned |
|
||
| **prize_money_usd** | measure | Prize money from Club Championship |
|
||
| **tournament_wins** | measure | Number of tournaments the club won |
|
||
| **top_8_finishes** | measure | Total top-8 tournament finishes |
|
||
| **eligible_to_win** | measure | 1 if the club was eligible for the grand prize |
|
||
|
||
**Example questions this enables:**
|
||
- How does prize money correlate with tournament wins vs breadth of top-8 finishes?
|
||
- Do Middle Eastern clubs outperform European clubs in the Club Championship?
|
||
- What is the average total_points for Current club partners vs non-partners?
|
||
|
||
---
|
||
|
||
## Star Schema Diagram
|
||
|
||
```
|
||
DIM_DATE
|
||
┌──────────┐
|
||
│ date_key │
|
||
└────┬─────┘
|
||
│ start/end
|
||
│
|
||
DIM_GAME ────────── FACT_TOURNAMENT ────────── DIM_ORGANIZATION
|
||
(game_key) (prize_pool_usd (org_key)
|
||
num_participants
|
||
duration_days
|
||
has_club_points)
|
||
|
||
|
||
DIM_COUNTRY ──┐
|
||
DIM_ORGAN. ──┼── FACT_MEDAL_AWARD ──── DIM_GAME
|
||
DIM_MEDAL ──┘ (medal_count (game_key)
|
||
DIM_DATE ─────┘ medal_points)
|
||
|
||
|
||
DIM_ORGANIZATION ── FACT_CLUB_STANDING
|
||
(total_points
|
||
prize_money_usd
|
||
tournament_wins
|
||
top_8_finishes)
|
||
```
|
||
|
||
---
|
||
|
||
## Why Three Fact Tables
|
||
|
||
A single fact table would require choosing one grain, which would make some analyses awkward or impossible.
|
||
|
||
- `FACT_TOURNAMENT` is at tournament grain — you cannot get per-player medal counts from it.
|
||
- `FACT_MEDAL_AWARD` is at player-medal grain — you cannot get prize pool totals from it without denormalizing tournament data into it.
|
||
- `FACT_CLUB_STANDING` captures a snapshot that has no natural place in the other two tables.
|
||
|
||
Keeping them separate means each fact table has a clean, single grain. Power BI can build relationships between them through the shared dimensions.
|