Docs init
This commit is contained in:
83
docs/01_overview.md
Normal file
83
docs/01_overview.md
Normal file
@@ -0,0 +1,83 @@
|
||||
# Project Overview
|
||||
|
||||
## What This Project Is
|
||||
|
||||
This project builds a complete data warehousing pipeline for the **Esports World Cup 2025 (EWC 2025)** — the world's largest esports event, held in Riyadh, Saudi Arabia across 27 tournaments from July to August 2025 with a total prize pool exceeding $100 million.
|
||||
|
||||
The goal is to take raw event data, load it into a structured transactional database, and then transform it into a Data Mart optimized for analytical reporting. The final output is a star schema in Oracle that can be connected to Power BI to answer business questions about prize distribution, country performance, club rankings, and more.
|
||||
|
||||
---
|
||||
|
||||
## Technology Stack
|
||||
|
||||
| Layer | Tool |
|
||||
|---|---|
|
||||
| Source data | Kaggle CSV dataset (10 files) |
|
||||
| OLTP database | MySQL 8.4 (Docker container) |
|
||||
| ETL pipeline | Apache NiFi |
|
||||
| Data Mart | Oracle (university lab schema) |
|
||||
| Reporting | Microsoft Power BI |
|
||||
| Infrastructure | Docker / Podman |
|
||||
| Seed script | .NET 10 (single-file C# script) |
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────┐
|
||||
│ Kaggle CSV files │ 10 files, ~700 rows total
|
||||
│ (./data/) │
|
||||
└────────┬────────────┘
|
||||
│ dotnet run ./scripts/seed.cs
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ MySQL 8.4 OLTP │ Normalized relational schema
|
||||
│ port 13306 │ 14 tables, 3NF
|
||||
│ (Docker) │
|
||||
└────────┬────────────┘
|
||||
│ Apache NiFi ETL
|
||||
│ ExecuteSQL → ConvertAvroToJSON → SplitJson
|
||||
│ → EvaluateJsonPath → PutSQL
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ Oracle Data Mart │ Star schema
|
||||
│ (university lab) │ 3 fact tables, 5 dimension tables
|
||||
└────────┬────────────┘
|
||||
│ Import / Live connection
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ Power BI Reports │ OLAP analytics, 2 dashboards
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
IPZ_1/
|
||||
├── data/ Raw Kaggle CSV files (source data)
|
||||
├── sql/
|
||||
│ ├── schema.sql MySQL OLTP schema DDL
|
||||
│ └── datamart_schema.sql Oracle Data Mart DDL
|
||||
├── scripts/
|
||||
│ └── seed.cs .NET 10 script to populate MySQL from CSVs
|
||||
├── docker/
|
||||
│ ├── start.sh / stop.sh Linux (Docker or Podman)
|
||||
│ └── start.ps1 / stop.ps1 Windows
|
||||
├── nifi/
|
||||
│ ├── sql/extract/ MySQL queries (one per ETL pipeline)
|
||||
│ ├── sql/load/ Oracle INSERT statements (one per ETL pipeline)
|
||||
│ └── NIFI_SETUP.md Step-by-step NiFi configuration guide
|
||||
└── docs/ This documentation
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Data Flow Summary
|
||||
|
||||
1. **Raw data** lives as 10 CSV files exported from Kaggle covering EWC 2025.
|
||||
2. **Seeding** — a single C# script reads all CSVs, resolves foreign key relationships, and populates the MySQL OLTP database in the correct order.
|
||||
3. **ETL** — Apache NiFi runs 8 pipelines. Each reads from MySQL, extracts records, and inserts rows into Oracle dimension and fact tables.
|
||||
4. **Reporting** — Power BI connects to Oracle and queries the star schema for OLAP analysis.
|
||||
Reference in New Issue
Block a user