Files
IPZ_1/docs/04_datamart.md
2026-05-17 17:17:04 +02:00

7.2 KiB
Raw Blame History

Data Mart

What a Data Mart Is

A Data Mart is a database optimized for reading and analysis rather than for recording transactions. While the OLTP schema is normalized to avoid redundancy, the Data Mart is deliberately denormalized into a star schema — a central fact table surrounded by dimension tables — so that analytical queries are fast and simple to write.

In a star schema:

  • Fact tables hold measurable events with numeric metrics (prize money, medal count, points)
  • Dimension tables hold descriptive context that you slice and filter by (game type, country, organization region)

The Data Mart is stored in the Oracle university lab schema and populated by Apache NiFi reading from the MySQL OLTP.

The DDL is in sql/datamart_schema.sql.


Dimensions

DIM_DATE

A standard calendar dimension covering every date in the EWC 2025 event window (July 8 August 24, 2025). Using a dedicated date dimension allows Power BI to filter by week, group by month, or compare by quarter with no extra calculation.

Column Example
date_key 20250708 (YYYYMMDD integer)
full_date 2025-07-08
year 2025
quarter 3
month / month_name 7 / July
week_number 28
day_of_month / day_name 8 / Tuesday

DIM_GAME

Describes each of the 25 game titles. Enables slicing facts by genre (MOBA vs FPS vs Battle Royale) and by platform (PC vs Mobile vs Console).

Column Example
name Counter-Strike 2
game_type FPS
platform PC

DIM_COUNTRY

Countries with their geographic region. Intentionally kept lean — the medal counts that live in the OLTP country table are not carried into this dimension because they are derived facts, not descriptive attributes.

Column Example
name South Korea
region Asia

DIM_ORGANIZATION

All esports clubs and teams. Includes partner metadata to enable analysis by partner tier (Current partner vs non-partner) and social reach.

Column Example
name Team Falcons
region Middle East
country Saudi Arabia
club_partner_status Current
founded_year 2017
social_media_followers_m 4.0

DIM_MEDAL

A simple three-row table representing the medal types. Includes medal_rank (1/2/3) so reports can sort Gold → Silver → Bronze correctly without relying on alphabetical ordering.

medal_type medal_rank
Gold 1
Silver 2
Bronze 3

Fact Tables

FACT_TOURNAMENT

Grain: one row per tournament (27 rows).

This is the primary financial fact table. It answers questions about prize money distribution across games, genres, platforms, and time.

Column Type Description
game_key FK → DIM_GAME What game
start_date_key FK → DIM_DATE When it started
end_date_key FK → DIM_DATE When it ended
winner_org_key FK → DIM_ORGANIZATION Winning organization (NULL for individual-winner events)
event_name text Degenerate dimension
gender text Open / Men / Women
prize_pool_usd measure Total prize pool in USD
num_participants measure Number of competing teams/players
duration_days measure Tournament length in days
has_club_points measure 1 if tournament awarded Club Championship points

Example questions this enables:

  • What was the total prize money awarded to MOBA tournaments vs FPS tournaments?
  • Which platform (PC or Mobile) had higher average prize pools?
  • How did prize pools vary across the 6-week event?

FACT_MEDAL_AWARD

Grain: one row per player-medal (257 rows).

This fact table captures individual competitive performance. Each medalist player contributes one row with a medal_count of 1 and a medal_points of 3/2/1. Both columns are additive — you can SUM them freely to get team medal totals, country medal totals, etc.

Column Type Description
game_key FK → DIM_GAME Game the medal was won in
medal_key FK → DIM_MEDAL Gold / Silver / Bronze
country_key FK → DIM_COUNTRY Player's nationality
org_key FK → DIM_ORGANIZATION Player's team
date_key FK → DIM_DATE Tournament start date
player_name text Degenerate dimension
medal_count measure Always 1 — additive for totals
medal_points measure Gold=3, Silver=2, Bronze=1

Example questions this enables:

  • Which country won the most medals overall? By region?
  • Which game genre produced the most medals for Asian countries?
  • Which organization accumulated the most medal points across all events?
  • Did South Korea dominate PC games while Southeast Asia dominated mobile games?

FACT_CLUB_STANDING

Grain: one row per club in the Club Championship (24 rows). This is a snapshot — it represents the final standings at the end of EWC 2025.

Column Type Description
org_key FK → DIM_ORGANIZATION The club
final_rank measure Final position (1 = best)
total_points measure Total Club Championship points earned
prize_money_usd measure Prize money from Club Championship
tournament_wins measure Number of tournaments the club won
top_8_finishes measure Total top-8 tournament finishes
eligible_to_win measure 1 if the club was eligible for the grand prize

Example questions this enables:

  • How does prize money correlate with tournament wins vs breadth of top-8 finishes?
  • Do Middle Eastern clubs outperform European clubs in the Club Championship?
  • What is the average total_points for Current club partners vs non-partners?

Star Schema Diagram

                      DIM_DATE
                     ┌──────────┐
                     │ date_key │
                     └────┬─────┘
                          │ start/end
                          │
DIM_GAME ────────── FACT_TOURNAMENT ────────── DIM_ORGANIZATION
(game_key)         (prize_pool_usd             (org_key)
                    num_participants
                    duration_days
                    has_club_points)


DIM_COUNTRY ──┐
DIM_ORGAN.  ──┼── FACT_MEDAL_AWARD ──── DIM_GAME
DIM_MEDAL   ──┘   (medal_count           (game_key)
DIM_DATE ─────┘    medal_points)


DIM_ORGANIZATION ── FACT_CLUB_STANDING
                    (total_points
                     prize_money_usd
                     tournament_wins
                     top_8_finishes)

Why Three Fact Tables

A single fact table would require choosing one grain, which would make some analyses awkward or impossible.

  • FACT_TOURNAMENT is at tournament grain — you cannot get per-player medal counts from it.
  • FACT_MEDAL_AWARD is at player-medal grain — you cannot get prize pool totals from it without denormalizing tournament data into it.
  • FACT_CLUB_STANDING captures a snapshot that has no natural place in the other two tables.

Keeping them separate means each fact table has a clean, single grain. Power BI can build relationships between them through the shared dimensions.