docs
This commit is contained in:
181
docs/04-setup.md
Normal file
181
docs/04-setup.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# Setup Guide
|
||||
|
||||
## Prerequisites
|
||||
|
||||
| Tool | Required for | Notes |
|
||||
|------|-------------|-------|
|
||||
| Docker or Podman | MySQL container | Use `--podman` flag on Linux |
|
||||
| .NET 10 SDK | Data generator | `dotnet run file.cs` support |
|
||||
| Apache NiFi | ETL | Running instance with Oracle + MySQL JDBC drivers |
|
||||
| Oracle JDBC driver | NiFi | `ojdbc11.jar` in NiFi's lib directory |
|
||||
| MySQL JDBC driver | NiFi | `mysql-connector-j-*.jar` in NiFi's lib directory |
|
||||
| Oracle DB access | Data mart target | University lab credentials |
|
||||
|
||||
---
|
||||
|
||||
## Step 1 — Start MySQL Container
|
||||
|
||||
**Linux / macOS (Docker):**
|
||||
```bash
|
||||
bash docker/start.sh
|
||||
```
|
||||
|
||||
**Linux / macOS (Podman):**
|
||||
```bash
|
||||
bash docker/start.sh --podman
|
||||
```
|
||||
|
||||
**Windows (PowerShell):**
|
||||
```powershell
|
||||
.\docker\start.ps1
|
||||
```
|
||||
|
||||
The script:
|
||||
- Creates a named container `hotel-mysql` with a persistent data volume
|
||||
- Mounts `sql/schema.sql` as an init script — all 13 tables are created automatically on first start
|
||||
- Waits until MySQL is ready before exiting
|
||||
|
||||
**Connection details:**
|
||||
```
|
||||
Host: 127.0.0.1
|
||||
Port: 13306
|
||||
Database: hotel_reservations
|
||||
User: root
|
||||
Password: hotel2025root
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 2 — Generate OLTP Data
|
||||
|
||||
```bash
|
||||
dotnet run generator/generate.cs
|
||||
```
|
||||
|
||||
**Runtime:** ~3 minutes
|
||||
**Output:** 635,000+ rows across 13 tables
|
||||
|
||||
The generator is deterministic (fixed seed `42`) — running it twice on an empty database produces the same data.
|
||||
|
||||
> **Important:** Run the generator only once on an empty database. If you need to restart, truncate all tables first (respecting FK order) or drop and recreate the container + volume.
|
||||
|
||||
### Quick table verification after generation:
|
||||
```bash
|
||||
# Docker
|
||||
docker exec hotel-mysql mysql -uroot -photel2025root hotel_reservations \
|
||||
-e "SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema='hotel_reservations';"
|
||||
|
||||
# Podman
|
||||
podman exec hotel-mysql mysql -uroot -photel2025root hotel_reservations \
|
||||
-e "SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema='hotel_reservations';"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 3 — Prepare Oracle Data Mart
|
||||
|
||||
Connect to the Oracle schema (university lab) and execute `sql/datamart_schema.sql`.
|
||||
|
||||
The script creates:
|
||||
- `ETL_WATERMARK` (with initial row for `FACT_ROOM_BOOKING`)
|
||||
- `STG_HOTEL` (staging)
|
||||
- All 7 dimension tables
|
||||
- `FACT_ROOM_BOOKING`
|
||||
|
||||
```sql
|
||||
-- Run in SQL*Plus or SQL Developer:
|
||||
@datamart_schema.sql
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 4 — Configure NiFi
|
||||
|
||||
### 4.1 Add JDBC drivers to NiFi
|
||||
|
||||
Copy the following JARs to `$NIFI_HOME/lib/` (or the NiFi extensions directory):
|
||||
- `mysql-connector-j-8.x.jar`
|
||||
- `ojdbc11.jar`
|
||||
|
||||
Restart NiFi after adding drivers.
|
||||
|
||||
### 4.2 Create Controller Services
|
||||
|
||||
In NiFi UI → Controller Settings → Controller Services:
|
||||
|
||||
**MySQL connection:**
|
||||
- Type: `DBCPConnectionPool`
|
||||
- Database Driver Class Name: `com.mysql.cj.jdbc.Driver`
|
||||
- Database Connection URL: `jdbc:mysql://127.0.0.1:13306/hotel_reservations`
|
||||
- Database User: `root`
|
||||
- Password: `hotel2025root`
|
||||
|
||||
**Oracle connection:**
|
||||
- Type: `DBCPConnectionPool`
|
||||
- Database Driver Class Name: `oracle.jdbc.OracleDriver`
|
||||
- Database Connection URL: `jdbc:oracle:thin:@<host>:1521:<sid>`
|
||||
- Database User: `<your_schema>`
|
||||
- Password: `<your_password>`
|
||||
|
||||
Enable both services.
|
||||
|
||||
### 4.3 Build Process Groups
|
||||
|
||||
Follow the detailed processor configuration in `docs/nifi-flow.md`.
|
||||
|
||||
**Recommended build order:**
|
||||
1. PG-1: Date Dimension (simplest, test first)
|
||||
2. PG-2: Static Dimensions (verify MERGE logic)
|
||||
3. PG-3: DIM_HOTEL SCD2 (most complex — check staging table after run)
|
||||
4. PG-4: DIM_GUEST SCD1
|
||||
5. PG-5: Fact Incremental Load
|
||||
|
||||
---
|
||||
|
||||
## Step 5 — Run ETL
|
||||
|
||||
### First full load
|
||||
|
||||
1. Run **PG-1** (Date Dimension) manually — run once
|
||||
2. Start **PG-2, PG-3, PG-4** — these are idempotent, safe to re-run
|
||||
3. Start **PG-5** — runs incrementally; first run loads all 531k room_bookings
|
||||
|
||||
### Verify load
|
||||
|
||||
```sql
|
||||
-- Oracle
|
||||
SELECT COUNT(*) FROM DIM_HOTEL; -- should be 200 (+ more after SCD2 changes)
|
||||
SELECT COUNT(*) FROM DIM_GUEST; -- 100,000
|
||||
SELECT COUNT(*) FROM FACT_ROOM_BOOKING; -- 531,382
|
||||
SELECT last_key FROM ETL_WATERMARK WHERE entity_name = 'FACT_ROOM_BOOKING'; -- 531,382
|
||||
```
|
||||
|
||||
### Verify SCD2 is working
|
||||
|
||||
```sql
|
||||
-- Should show 1 current version per hotel on initial load
|
||||
SELECT is_current, COUNT(*) FROM DIM_HOTEL GROUP BY is_current;
|
||||
-- Expected: IS_CURRENT=1, COUNT=200
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Stop / Restart
|
||||
|
||||
**Stop MySQL (preserves data):**
|
||||
```bash
|
||||
bash docker/stop.sh [--podman]
|
||||
```
|
||||
|
||||
**Restart MySQL:**
|
||||
```bash
|
||||
bash docker/start.sh [--podman]
|
||||
```
|
||||
|
||||
**Full reset (delete all data):**
|
||||
```bash
|
||||
bash docker/stop.sh --podman
|
||||
podman volume rm hotel-mysql-data
|
||||
bash docker/start.sh --podman
|
||||
dotnet run generator/generate.cs
|
||||
```
|
||||
Reference in New Issue
Block a user