Files
IPZ_1/docs/01_overview.md
2026-05-17 17:17:04 +02:00

3.4 KiB

Project Overview

What This Project Is

This project builds a complete data warehousing pipeline for the Esports World Cup 2025 (EWC 2025) — the world's largest esports event, held in Riyadh, Saudi Arabia across 27 tournaments from July to August 2025 with a total prize pool exceeding $100 million.

The goal is to take raw event data, load it into a structured transactional database, and then transform it into a Data Mart optimized for analytical reporting. The final output is a star schema in Oracle that can be connected to Power BI to answer business questions about prize distribution, country performance, club rankings, and more.


Technology Stack

Layer Tool
Source data Kaggle CSV dataset (10 files)
OLTP database MySQL 8.4 (Docker container)
ETL pipeline Apache NiFi
Data Mart Oracle (university lab schema)
Reporting Microsoft Power BI
Infrastructure Docker / Podman
Seed script .NET 10 (single-file C# script)

Architecture

┌─────────────────────┐
│   Kaggle CSV files  │  10 files, ~700 rows total
│   (./data/)         │
└────────┬────────────┘
         │  dotnet run ./scripts/seed.cs
         ▼
┌─────────────────────┐
│   MySQL 8.4 OLTP    │  Normalized relational schema
│   port 13306        │  14 tables, 3NF
│   (Docker)          │
└────────┬────────────┘
         │  Apache NiFi ETL
         │  ExecuteSQL → ConvertAvroToJSON → SplitJson
         │  → EvaluateJsonPath → PutSQL
         ▼
┌─────────────────────┐
│   Oracle Data Mart  │  Star schema
│   (university lab)  │  3 fact tables, 5 dimension tables
└────────┬────────────┘
         │  Import / Live connection
         ▼
┌─────────────────────┐
│   Power BI Reports  │  OLAP analytics, 2 dashboards
└─────────────────────┘

Project Structure

IPZ_1/
├── data/               Raw Kaggle CSV files (source data)
├── sql/
│   ├── schema.sql          MySQL OLTP schema DDL
│   └── datamart_schema.sql Oracle Data Mart DDL
├── scripts/
│   └── seed.cs             .NET 10 script to populate MySQL from CSVs
├── docker/
│   ├── start.sh / stop.sh  Linux (Docker or Podman)
│   └── start.ps1 / stop.ps1  Windows
├── nifi/
│   ├── sql/extract/        MySQL queries (one per ETL pipeline)
│   ├── sql/load/           Oracle INSERT statements (one per ETL pipeline)
│   └── NIFI_SETUP.md       Step-by-step NiFi configuration guide
└── docs/                   This documentation

Data Flow Summary

  1. Raw data lives as 10 CSV files exported from Kaggle covering EWC 2025.
  2. Seeding — a single C# script reads all CSVs, resolves foreign key relationships, and populates the MySQL OLTP database in the correct order.
  3. ETL — Apache NiFi runs 8 pipelines. Each reads from MySQL, extracts records, and inserts rows into Oracle dimension and fact tables.
  4. Reporting — Power BI connects to Oracle and queries the star schema for OLAP analysis.