3.4 KiB
3.4 KiB
Project Overview
What This Project Is
This project builds a complete data warehousing pipeline for the Esports World Cup 2025 (EWC 2025) — the world's largest esports event, held in Riyadh, Saudi Arabia across 27 tournaments from July to August 2025 with a total prize pool exceeding $100 million.
The goal is to take raw event data, load it into a structured transactional database, and then transform it into a Data Mart optimized for analytical reporting. The final output is a star schema in Oracle that can be connected to Power BI to answer business questions about prize distribution, country performance, club rankings, and more.
Technology Stack
| Layer | Tool |
|---|---|
| Source data | Kaggle CSV dataset (10 files) |
| OLTP database | MySQL 8.4 (Docker container) |
| ETL pipeline | Apache NiFi |
| Data Mart | Oracle (university lab schema) |
| Reporting | Microsoft Power BI |
| Infrastructure | Docker / Podman |
| Seed script | .NET 10 (single-file C# script) |
Architecture
┌─────────────────────┐
│ Kaggle CSV files │ 10 files, ~700 rows total
│ (./data/) │
└────────┬────────────┘
│ dotnet run ./scripts/seed.cs
▼
┌─────────────────────┐
│ MySQL 8.4 OLTP │ Normalized relational schema
│ port 13306 │ 14 tables, 3NF
│ (Docker) │
└────────┬────────────┘
│ Apache NiFi ETL
│ ExecuteSQL → ConvertAvroToJSON → SplitJson
│ → EvaluateJsonPath → PutSQL
▼
┌─────────────────────┐
│ Oracle Data Mart │ Star schema
│ (university lab) │ 3 fact tables, 5 dimension tables
└────────┬────────────┘
│ Import / Live connection
▼
┌─────────────────────┐
│ Power BI Reports │ OLAP analytics, 2 dashboards
└─────────────────────┘
Project Structure
IPZ_1/
├── data/ Raw Kaggle CSV files (source data)
├── sql/
│ ├── schema.sql MySQL OLTP schema DDL
│ └── datamart_schema.sql Oracle Data Mart DDL
├── scripts/
│ └── seed.cs .NET 10 script to populate MySQL from CSVs
├── docker/
│ ├── start.sh / stop.sh Linux (Docker or Podman)
│ └── start.ps1 / stop.ps1 Windows
├── nifi/
│ ├── sql/extract/ MySQL queries (one per ETL pipeline)
│ ├── sql/load/ Oracle INSERT statements (one per ETL pipeline)
│ └── NIFI_SETUP.md Step-by-step NiFi configuration guide
└── docs/ This documentation
Data Flow Summary
- Raw data lives as 10 CSV files exported from Kaggle covering EWC 2025.
- Seeding — a single C# script reads all CSVs, resolves foreign key relationships, and populates the MySQL OLTP database in the correct order.
- ETL — Apache NiFi runs 8 pipelines. Each reads from MySQL, extracts records, and inserts rows into Oracle dimension and fact tables.
- Reporting — Power BI connects to Oracle and queries the star schema for OLAP analysis.