Docs init

This commit is contained in:
2026-05-17 17:17:04 +02:00
parent 293b6f0266
commit 22b2289933
7 changed files with 812 additions and 0 deletions

83
docs/01_overview.md Normal file
View File

@@ -0,0 +1,83 @@
# Project Overview
## What This Project Is
This project builds a complete data warehousing pipeline for the **Esports World Cup 2025 (EWC 2025)** — the world's largest esports event, held in Riyadh, Saudi Arabia across 27 tournaments from July to August 2025 with a total prize pool exceeding $100 million.
The goal is to take raw event data, load it into a structured transactional database, and then transform it into a Data Mart optimized for analytical reporting. The final output is a star schema in Oracle that can be connected to Power BI to answer business questions about prize distribution, country performance, club rankings, and more.
---
## Technology Stack
| Layer | Tool |
|---|---|
| Source data | Kaggle CSV dataset (10 files) |
| OLTP database | MySQL 8.4 (Docker container) |
| ETL pipeline | Apache NiFi |
| Data Mart | Oracle (university lab schema) |
| Reporting | Microsoft Power BI |
| Infrastructure | Docker / Podman |
| Seed script | .NET 10 (single-file C# script) |
---
## Architecture
```
┌─────────────────────┐
│ Kaggle CSV files │ 10 files, ~700 rows total
│ (./data/) │
└────────┬────────────┘
│ dotnet run ./scripts/seed.cs
┌─────────────────────┐
│ MySQL 8.4 OLTP │ Normalized relational schema
│ port 13306 │ 14 tables, 3NF
│ (Docker) │
└────────┬────────────┘
│ Apache NiFi ETL
│ ExecuteSQL → ConvertAvroToJSON → SplitJson
│ → EvaluateJsonPath → PutSQL
┌─────────────────────┐
│ Oracle Data Mart │ Star schema
│ (university lab) │ 3 fact tables, 5 dimension tables
└────────┬────────────┘
│ Import / Live connection
┌─────────────────────┐
│ Power BI Reports │ OLAP analytics, 2 dashboards
└─────────────────────┘
```
---
## Project Structure
```
IPZ_1/
├── data/ Raw Kaggle CSV files (source data)
├── sql/
│ ├── schema.sql MySQL OLTP schema DDL
│ └── datamart_schema.sql Oracle Data Mart DDL
├── scripts/
│ └── seed.cs .NET 10 script to populate MySQL from CSVs
├── docker/
│ ├── start.sh / stop.sh Linux (Docker or Podman)
│ └── start.ps1 / stop.ps1 Windows
├── nifi/
│ ├── sql/extract/ MySQL queries (one per ETL pipeline)
│ ├── sql/load/ Oracle INSERT statements (one per ETL pipeline)
│ └── NIFI_SETUP.md Step-by-step NiFi configuration guide
└── docs/ This documentation
```
---
## Data Flow Summary
1. **Raw data** lives as 10 CSV files exported from Kaggle covering EWC 2025.
2. **Seeding** — a single C# script reads all CSVs, resolves foreign key relationships, and populates the MySQL OLTP database in the correct order.
3. **ETL** — Apache NiFi runs 8 pipelines. Each reads from MySQL, extracts records, and inserts rows into Oracle dimension and fact tables.
4. **Reporting** — Power BI connects to Oracle and queries the star schema for OLAP analysis.