182 lines
4.6 KiB
Markdown
182 lines
4.6 KiB
Markdown
# Setup Guide
|
|
|
|
## Prerequisites
|
|
|
|
| Tool | Required for | Notes |
|
|
|------|-------------|-------|
|
|
| Docker or Podman | MySQL container | Use `--podman` flag on Linux |
|
|
| .NET 10 SDK | Data generator | `dotnet run file.cs` support |
|
|
| Apache NiFi | ETL | Running instance with Oracle + MySQL JDBC drivers |
|
|
| Oracle JDBC driver | NiFi | `ojdbc11.jar` in NiFi's lib directory |
|
|
| MySQL JDBC driver | NiFi | `mysql-connector-j-*.jar` in NiFi's lib directory |
|
|
| Oracle DB access | Data mart target | University lab credentials |
|
|
|
|
---
|
|
|
|
## Step 1 — Start MySQL Container
|
|
|
|
**Linux / macOS (Docker):**
|
|
```bash
|
|
bash docker/start.sh
|
|
```
|
|
|
|
**Linux / macOS (Podman):**
|
|
```bash
|
|
bash docker/start.sh --podman
|
|
```
|
|
|
|
**Windows (PowerShell):**
|
|
```powershell
|
|
.\docker\start.ps1
|
|
```
|
|
|
|
The script:
|
|
- Creates a named container `hotel-mysql` with a persistent data volume
|
|
- Mounts `sql/schema.sql` as an init script — all 13 tables are created automatically on first start
|
|
- Waits until MySQL is ready before exiting
|
|
|
|
**Connection details:**
|
|
```
|
|
Host: 127.0.0.1
|
|
Port: 13306
|
|
Database: hotel_reservations
|
|
User: root
|
|
Password: hotel2025root
|
|
```
|
|
|
|
---
|
|
|
|
## Step 2 — Generate OLTP Data
|
|
|
|
```bash
|
|
dotnet run generator/generate.cs
|
|
```
|
|
|
|
**Runtime:** ~3 minutes
|
|
**Output:** 635,000+ rows across 13 tables
|
|
|
|
The generator is deterministic (fixed seed `42`) — running it twice on an empty database produces the same data.
|
|
|
|
> **Important:** Run the generator only once on an empty database. If you need to restart, truncate all tables first (respecting FK order) or drop and recreate the container + volume.
|
|
|
|
### Quick table verification after generation:
|
|
```bash
|
|
# Docker
|
|
docker exec hotel-mysql mysql -uroot -photel2025root hotel_reservations \
|
|
-e "SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema='hotel_reservations';"
|
|
|
|
# Podman
|
|
podman exec hotel-mysql mysql -uroot -photel2025root hotel_reservations \
|
|
-e "SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema='hotel_reservations';"
|
|
```
|
|
|
|
---
|
|
|
|
## Step 3 — Prepare Oracle Data Mart
|
|
|
|
Connect to the Oracle schema (university lab) and execute `sql/datamart_schema.sql`.
|
|
|
|
The script creates:
|
|
- `ETL_WATERMARK` (with initial row for `FACT_ROOM_BOOKING`)
|
|
- `STG_HOTEL` (staging)
|
|
- All 7 dimension tables
|
|
- `FACT_ROOM_BOOKING`
|
|
|
|
```sql
|
|
-- Run in SQL*Plus or SQL Developer:
|
|
@datamart_schema.sql
|
|
```
|
|
|
|
---
|
|
|
|
## Step 4 — Configure NiFi
|
|
|
|
### 4.1 Add JDBC drivers to NiFi
|
|
|
|
Copy the following JARs to `$NIFI_HOME/lib/` (or the NiFi extensions directory):
|
|
- `mysql-connector-j-8.x.jar`
|
|
- `ojdbc11.jar`
|
|
|
|
Restart NiFi after adding drivers.
|
|
|
|
### 4.2 Create Controller Services
|
|
|
|
In NiFi UI → Controller Settings → Controller Services:
|
|
|
|
**MySQL connection:**
|
|
- Type: `DBCPConnectionPool`
|
|
- Database Driver Class Name: `com.mysql.cj.jdbc.Driver`
|
|
- Database Connection URL: `jdbc:mysql://127.0.0.1:13306/hotel_reservations`
|
|
- Database User: `root`
|
|
- Password: `hotel2025root`
|
|
|
|
**Oracle connection:**
|
|
- Type: `DBCPConnectionPool`
|
|
- Database Driver Class Name: `oracle.jdbc.OracleDriver`
|
|
- Database Connection URL: `jdbc:oracle:thin:@<host>:1521:<sid>`
|
|
- Database User: `<your_schema>`
|
|
- Password: `<your_password>`
|
|
|
|
Enable both services.
|
|
|
|
### 4.3 Build Process Groups
|
|
|
|
Follow the detailed processor configuration in `docs/nifi-flow.md`.
|
|
|
|
**Recommended build order:**
|
|
1. PG-1: Date Dimension (simplest, test first)
|
|
2. PG-2: Static Dimensions (verify MERGE logic)
|
|
3. PG-3: DIM_HOTEL SCD2 (most complex — check staging table after run)
|
|
4. PG-4: DIM_GUEST SCD1
|
|
5. PG-5: Fact Incremental Load
|
|
|
|
---
|
|
|
|
## Step 5 — Run ETL
|
|
|
|
### First full load
|
|
|
|
1. Run **PG-1** (Date Dimension) manually — run once
|
|
2. Start **PG-2, PG-3, PG-4** — these are idempotent, safe to re-run
|
|
3. Start **PG-5** — runs incrementally; first run loads all 531k room_bookings
|
|
|
|
### Verify load
|
|
|
|
```sql
|
|
-- Oracle
|
|
SELECT COUNT(*) FROM DIM_HOTEL; -- should be 200 (+ more after SCD2 changes)
|
|
SELECT COUNT(*) FROM DIM_GUEST; -- 100,000
|
|
SELECT COUNT(*) FROM FACT_ROOM_BOOKING; -- 531,382
|
|
SELECT last_key FROM ETL_WATERMARK WHERE entity_name = 'FACT_ROOM_BOOKING'; -- 531,382
|
|
```
|
|
|
|
### Verify SCD2 is working
|
|
|
|
```sql
|
|
-- Should show 1 current version per hotel on initial load
|
|
SELECT is_current, COUNT(*) FROM DIM_HOTEL GROUP BY is_current;
|
|
-- Expected: IS_CURRENT=1, COUNT=200
|
|
```
|
|
|
|
---
|
|
|
|
## Stop / Restart
|
|
|
|
**Stop MySQL (preserves data):**
|
|
```bash
|
|
bash docker/stop.sh [--podman]
|
|
```
|
|
|
|
**Restart MySQL:**
|
|
```bash
|
|
bash docker/start.sh [--podman]
|
|
```
|
|
|
|
**Full reset (delete all data):**
|
|
```bash
|
|
bash docker/stop.sh --podman
|
|
podman volume rm hotel-mysql-data
|
|
bash docker/start.sh --podman
|
|
dotnet run generator/generate.cs
|
|
```
|