4.6 KiB
Setup Guide
Prerequisites
| Tool | Required for | Notes |
|---|---|---|
| Docker or Podman | MySQL container | Use --podman flag on Linux |
| .NET 10 SDK | Data generator | dotnet run file.cs support |
| Apache NiFi | ETL | Running instance with Oracle + MySQL JDBC drivers |
| Oracle JDBC driver | NiFi | ojdbc11.jar in NiFi's lib directory |
| MySQL JDBC driver | NiFi | mysql-connector-j-*.jar in NiFi's lib directory |
| Oracle DB access | Data mart target | University lab credentials |
Step 1 — Start MySQL Container
Linux / macOS (Docker):
bash docker/start.sh
Linux / macOS (Podman):
bash docker/start.sh --podman
Windows (PowerShell):
.\docker\start.ps1
The script:
- Creates a named container
hotel-mysqlwith a persistent data volume - Mounts
sql/schema.sqlas an init script — all 13 tables are created automatically on first start - Waits until MySQL is ready before exiting
Connection details:
Host: 127.0.0.1
Port: 13306
Database: hotel_reservations
User: root
Password: hotel2025root
Step 2 — Generate OLTP Data
dotnet run generator/generate.cs
Runtime: ~3 minutes
Output: 635,000+ rows across 13 tables
The generator is deterministic (fixed seed 42) — running it twice on an empty database produces the same data.
Important: Run the generator only once on an empty database. If you need to restart, truncate all tables first (respecting FK order) or drop and recreate the container + volume.
Quick table verification after generation:
# Docker
docker exec hotel-mysql mysql -uroot -photel2025root hotel_reservations \
-e "SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema='hotel_reservations';"
# Podman
podman exec hotel-mysql mysql -uroot -photel2025root hotel_reservations \
-e "SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema='hotel_reservations';"
Step 3 — Prepare Oracle Data Mart
Connect to the Oracle schema (university lab) and execute sql/datamart_schema.sql.
The script creates:
ETL_WATERMARK(with initial row forFACT_ROOM_BOOKING)STG_HOTEL(staging)- All 7 dimension tables
FACT_ROOM_BOOKING
-- Run in SQL*Plus or SQL Developer:
@datamart_schema.sql
Step 4 — Configure NiFi
4.1 Add JDBC drivers to NiFi
Copy the following JARs to $NIFI_HOME/lib/ (or the NiFi extensions directory):
mysql-connector-j-8.x.jarojdbc11.jar
Restart NiFi after adding drivers.
4.2 Create Controller Services
In NiFi UI → Controller Settings → Controller Services:
MySQL connection:
- Type:
DBCPConnectionPool - Database Driver Class Name:
com.mysql.cj.jdbc.Driver - Database Connection URL:
jdbc:mysql://127.0.0.1:13306/hotel_reservations - Database User:
root - Password:
hotel2025root
Oracle connection:
- Type:
DBCPConnectionPool - Database Driver Class Name:
oracle.jdbc.OracleDriver - Database Connection URL:
jdbc:oracle:thin:@<host>:1521:<sid> - Database User:
<your_schema> - Password:
<your_password>
Enable both services.
4.3 Build Process Groups
Follow the detailed processor configuration in docs/nifi-flow.md.
Recommended build order:
- PG-1: Date Dimension (simplest, test first)
- PG-2: Static Dimensions (verify MERGE logic)
- PG-3: DIM_HOTEL SCD2 (most complex — check staging table after run)
- PG-4: DIM_GUEST SCD1
- PG-5: Fact Incremental Load
Step 5 — Run ETL
First full load
- Run PG-1 (Date Dimension) manually — run once
- Start PG-2, PG-3, PG-4 — these are idempotent, safe to re-run
- Start PG-5 — runs incrementally; first run loads all 531k room_bookings
Verify load
-- Oracle
SELECT COUNT(*) FROM DIM_HOTEL; -- should be 200 (+ more after SCD2 changes)
SELECT COUNT(*) FROM DIM_GUEST; -- 100,000
SELECT COUNT(*) FROM FACT_ROOM_BOOKING; -- 531,382
SELECT last_key FROM ETL_WATERMARK WHERE entity_name = 'FACT_ROOM_BOOKING'; -- 531,382
Verify SCD2 is working
-- Should show 1 current version per hotel on initial load
SELECT is_current, COUNT(*) FROM DIM_HOTEL GROUP BY is_current;
-- Expected: IS_CURRENT=1, COUNT=200
Stop / Restart
Stop MySQL (preserves data):
bash docker/stop.sh [--podman]
Restart MySQL:
bash docker/start.sh [--podman]
Full reset (delete all data):
bash docker/stop.sh --podman
podman volume rm hotel-mysql-data
bash docker/start.sh --podman
dotnet run generator/generate.cs