# Project Overview ## What This Project Is This project builds a complete data warehousing pipeline for the **Esports World Cup 2025 (EWC 2025)** — the world's largest esports event, held in Riyadh, Saudi Arabia across 27 tournaments from July to August 2025 with a total prize pool exceeding $100 million. The goal is to take raw event data, load it into a structured transactional database, and then transform it into a Data Mart optimized for analytical reporting. The final output is a star schema in Oracle that can be connected to Power BI to answer business questions about prize distribution, country performance, club rankings, and more. --- ## Technology Stack | Layer | Tool | |---|---| | Source data | Kaggle CSV dataset (10 files) | | OLTP database | MySQL 8.4 (Docker container) | | ETL pipeline | Apache NiFi | | Data Mart | Oracle (university lab schema) | | Reporting | Microsoft Power BI | | Infrastructure | Docker / Podman | | Seed script | .NET 10 (single-file C# script) | --- ## Architecture ``` ┌─────────────────────┐ │ Kaggle CSV files │ 10 files, ~700 rows total │ (./data/) │ └────────┬────────────┘ │ dotnet run ./scripts/seed.cs ▼ ┌─────────────────────┐ │ MySQL 8.4 OLTP │ Normalized relational schema │ port 13306 │ 14 tables, 3NF │ (Docker) │ └────────┬────────────┘ │ Apache NiFi ETL │ ExecuteSQL → ConvertAvroToJSON → SplitJson │ → EvaluateJsonPath → PutSQL ▼ ┌─────────────────────┐ │ Oracle Data Mart │ Star schema │ (university lab) │ 3 fact tables, 5 dimension tables └────────┬────────────┘ │ Import / Live connection ▼ ┌─────────────────────┐ │ Power BI Reports │ OLAP analytics, 2 dashboards └─────────────────────┘ ``` --- ## Project Structure ``` IPZ_1/ ├── data/ Raw Kaggle CSV files (source data) ├── sql/ │ ├── schema.sql MySQL OLTP schema DDL │ └── datamart_schema.sql Oracle Data Mart DDL ├── scripts/ │ └── seed.cs .NET 10 script to populate MySQL from CSVs ├── docker/ │ ├── start.sh / stop.sh Linux (Docker or Podman) │ └── start.ps1 / stop.ps1 Windows ├── nifi/ │ ├── sql/extract/ MySQL queries (one per ETL pipeline) │ ├── sql/load/ Oracle INSERT statements (one per ETL pipeline) │ └── NIFI_SETUP.md Step-by-step NiFi configuration guide └── docs/ This documentation ``` --- ## Data Flow Summary 1. **Raw data** lives as 10 CSV files exported from Kaggle covering EWC 2025. 2. **Seeding** — a single C# script reads all CSVs, resolves foreign key relationships, and populates the MySQL OLTP database in the correct order. 3. **ETL** — Apache NiFi runs 8 pipelines. Each reads from MySQL, extracts records, and inserts rows into Oracle dimension and fact tables. 4. **Reporting** — Power BI connects to Oracle and queries the star schema for OLAP analysis.