84 lines
3.4 KiB
Markdown
84 lines
3.4 KiB
Markdown
# Project Overview
|
|
|
|
## What This Project Is
|
|
|
|
This project builds a complete data warehousing pipeline for the **Esports World Cup 2025 (EWC 2025)** — the world's largest esports event, held in Riyadh, Saudi Arabia across 27 tournaments from July to August 2025 with a total prize pool exceeding $100 million.
|
|
|
|
The goal is to take raw event data, load it into a structured transactional database, and then transform it into a Data Mart optimized for analytical reporting. The final output is a star schema in Oracle that can be connected to Power BI to answer business questions about prize distribution, country performance, club rankings, and more.
|
|
|
|
---
|
|
|
|
## Technology Stack
|
|
|
|
| Layer | Tool |
|
|
|---|---|
|
|
| Source data | Kaggle CSV dataset (10 files) |
|
|
| OLTP database | MySQL 8.4 (Docker container) |
|
|
| ETL pipeline | Apache NiFi |
|
|
| Data Mart | Oracle (university lab schema) |
|
|
| Reporting | Microsoft Power BI |
|
|
| Infrastructure | Docker / Podman |
|
|
| Seed script | .NET 10 (single-file C# script) |
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────┐
|
|
│ Kaggle CSV files │ 10 files, ~700 rows total
|
|
│ (./data/) │
|
|
└────────┬────────────┘
|
|
│ dotnet run ./scripts/seed.cs
|
|
▼
|
|
┌─────────────────────┐
|
|
│ MySQL 8.4 OLTP │ Normalized relational schema
|
|
│ port 13306 │ 14 tables, 3NF
|
|
│ (Docker) │
|
|
└────────┬────────────┘
|
|
│ Apache NiFi ETL
|
|
│ ExecuteSQL → ConvertAvroToJSON → SplitJson
|
|
│ → EvaluateJsonPath → PutSQL
|
|
▼
|
|
┌─────────────────────┐
|
|
│ Oracle Data Mart │ Star schema
|
|
│ (university lab) │ 3 fact tables, 5 dimension tables
|
|
└────────┬────────────┘
|
|
│ Import / Live connection
|
|
▼
|
|
┌─────────────────────┐
|
|
│ Power BI Reports │ OLAP analytics, 2 dashboards
|
|
└─────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
IPZ_1/
|
|
├── data/ Raw Kaggle CSV files (source data)
|
|
├── sql/
|
|
│ ├── schema.sql MySQL OLTP schema DDL
|
|
│ └── datamart_schema.sql Oracle Data Mart DDL
|
|
├── scripts/
|
|
│ └── seed.cs .NET 10 script to populate MySQL from CSVs
|
|
├── docker/
|
|
│ ├── start.sh / stop.sh Linux (Docker or Podman)
|
|
│ └── start.ps1 / stop.ps1 Windows
|
|
├── nifi/
|
|
│ ├── sql/extract/ MySQL queries (one per ETL pipeline)
|
|
│ ├── sql/load/ Oracle INSERT statements (one per ETL pipeline)
|
|
│ └── NIFI_SETUP.md Step-by-step NiFi configuration guide
|
|
└── docs/ This documentation
|
|
```
|
|
|
|
---
|
|
|
|
## Data Flow Summary
|
|
|
|
1. **Raw data** lives as 10 CSV files exported from Kaggle covering EWC 2025.
|
|
2. **Seeding** — a single C# script reads all CSVs, resolves foreign key relationships, and populates the MySQL OLTP database in the correct order.
|
|
3. **ETL** — Apache NiFi runs 8 pipelines. Each reads from MySQL, extracts records, and inserts rows into Oracle dimension and fact tables.
|
|
4. **Reporting** — Power BI connects to Oracle and queries the star schema for OLAP analysis.
|