# Project Overview

## What This Project Is

This project builds a complete data warehousing pipeline for the **Esports World Cup 2025 (EWC 2025)** — the world's largest esports event, held in Riyadh, Saudi Arabia across 27 tournaments from July to August 2025 with a total prize pool exceeding $100 million.

The goal is to take raw event data, load it into a structured transactional database, and then transform it into a Data Mart optimized for analytical reporting. The final output is a star schema in Oracle that can be connected to Power BI to answer business questions about prize distribution, country performance, club rankings, and more.

---

## Technology Stack

| Layer | Tool |
|---|---|
| Source data | Kaggle CSV dataset (10 files) |
| OLTP database | MySQL 8.4 (Docker container) |
| ETL pipeline | Apache NiFi |
| Data Mart | Oracle (university lab schema) |
| Reporting | Microsoft Power BI |
| Infrastructure | Docker / Podman |
| Seed script | .NET 10 (single-file C# script) |

---

## Architecture

```
┌─────────────────────┐
│   Kaggle CSV files  │  10 files, ~700 rows total
│   (./data/)         │
└────────┬────────────┘
         │  dotnet run ./scripts/seed.cs
         ▼
┌─────────────────────┐
│   MySQL 8.4 OLTP    │  Normalized relational schema
│   port 13306        │  14 tables, 3NF
│   (Docker)          │
└────────┬────────────┘
         │  Apache NiFi ETL
         │  ExecuteSQL → ConvertAvroToJSON → SplitJson
         │  → EvaluateJsonPath → PutSQL
         ▼
┌─────────────────────┐
│   Oracle Data Mart  │  Star schema
│   (university lab)  │  3 fact tables, 5 dimension tables
└────────┬────────────┘
         │  Import / Live connection
         ▼
┌─────────────────────┐
│   Power BI Reports  │  OLAP analytics, 2 dashboards
└─────────────────────┘
```

---

## Project Structure

```
IPZ_1/
├── data/               Raw Kaggle CSV files (source data)
├── sql/
│   ├── schema.sql          MySQL OLTP schema DDL
│   └── datamart_schema.sql Oracle Data Mart DDL
├── scripts/
│   └── seed.cs             .NET 10 script to populate MySQL from CSVs
├── docker/
│   ├── start.sh / stop.sh  Linux (Docker or Podman)
│   └── start.ps1 / stop.ps1  Windows
├── nifi/
│   ├── sql/extract/        MySQL queries (one per ETL pipeline)
│   ├── sql/load/           Oracle INSERT statements (one per ETL pipeline)
│   └── NIFI_SETUP.md       Step-by-step NiFi configuration guide
└── docs/                   This documentation
```

---

## Data Flow Summary

1. **Raw data** lives as 10 CSV files exported from Kaggle covering EWC 2025.
2. **Seeding** — a single C# script reads all CSVs, resolves foreign key relationships, and populates the MySQL OLTP database in the correct order.
3. **ETL** — Apache NiFi runs 8 pipelines. Each reads from MySQL, extracts records, and inserts rows into Oracle dimension and fact tables.
4. **Reporting** — Power BI connects to Oracle and queries the star schema for OLAP analysis.