Docs init

2026-05-17 17:17:04 +02:00
parent 293b6f0266
commit 22b2289933
7 changed files with 812 additions and 0 deletions
--- a/docs/01_overview.md
+++ b/docs/01_overview.md
@@ -0,0 +1,83 @@
+# Project Overview
+
+## What This Project Is
+
+This project builds a complete data warehousing pipeline for the **Esports World Cup 2025 (EWC 2025)** — the world's largest esports event, held in Riyadh, Saudi Arabia across 27 tournaments from July to August 2025 with a total prize pool exceeding $100 million.
+
+The goal is to take raw event data, load it into a structured transactional database, and then transform it into a Data Mart optimized for analytical reporting. The final output is a star schema in Oracle that can be connected to Power BI to answer business questions about prize distribution, country performance, club rankings, and more.
+
+---
+
+## Technology Stack
+
+| Layer | Tool |
+|---|---|
+| Source data | Kaggle CSV dataset (10 files) |
+| OLTP database | MySQL 8.4 (Docker container) |
+| ETL pipeline | Apache NiFi |
+| Data Mart | Oracle (university lab schema) |
+| Reporting | Microsoft Power BI |
+| Infrastructure | Docker / Podman |
+| Seed script | .NET 10 (single-file C# script) |
+
+---
+
+## Architecture
+
+```
+┌─────────────────────┐
+│   Kaggle CSV files  │  10 files, ~700 rows total
+│   (./data/)         │
+└────────┬────────────┘
+         │  dotnet run ./scripts/seed.cs
+         ▼
+┌─────────────────────┐
+│   MySQL 8.4 OLTP    │  Normalized relational schema
+│   port 13306        │  14 tables, 3NF
+│   (Docker)          │
+└────────┬────────────┘
+         │  Apache NiFi ETL
+         │  ExecuteSQL → ConvertAvroToJSON → SplitJson
+         │  → EvaluateJsonPath → PutSQL
+         ▼
+┌─────────────────────┐
+│   Oracle Data Mart  │  Star schema
+│   (university lab)  │  3 fact tables, 5 dimension tables
+└────────┬────────────┘
+         │  Import / Live connection
+         ▼
+┌─────────────────────┐
+│   Power BI Reports  │  OLAP analytics, 2 dashboards
+└─────────────────────┘
+```
+
+---
+
+## Project Structure
+
+```
+IPZ_1/
+├── data/               Raw Kaggle CSV files (source data)
+├── sql/
+│   ├── schema.sql          MySQL OLTP schema DDL
+│   └── datamart_schema.sql Oracle Data Mart DDL
+├── scripts/
+│   └── seed.cs             .NET 10 script to populate MySQL from CSVs
+├── docker/
+│   ├── start.sh / stop.sh  Linux (Docker or Podman)
+│   └── start.ps1 / stop.ps1  Windows
+├── nifi/
+│   ├── sql/extract/        MySQL queries (one per ETL pipeline)
+│   ├── sql/load/           Oracle INSERT statements (one per ETL pipeline)
+│   └── NIFI_SETUP.md       Step-by-step NiFi configuration guide
+└── docs/                   This documentation
+```
+
+---
+
+## Data Flow Summary
+
+1. **Raw data** lives as 10 CSV files exported from Kaggle covering EWC 2025.
+2. **Seeding** — a single C# script reads all CSVs, resolves foreign key relationships, and populates the MySQL OLTP database in the correct order.
+3. **ETL** — Apache NiFi runs 8 pipelines. Each reads from MySQL, extracts records, and inserts rows into Oracle dimension and fact tables.
+4. **Reporting** — Power BI connects to Oracle and queries the star schema for OLAP analysis.