Files
IPZ_1/docs/04-setup.md
2026-05-17 21:27:42 +02:00

4.6 KiB

Setup Guide

Prerequisites

Tool Required for Notes
Docker or Podman MySQL container Use --podman flag on Linux
.NET 10 SDK Data generator dotnet run file.cs support
Apache NiFi ETL Running instance with Oracle + MySQL JDBC drivers
Oracle JDBC driver NiFi ojdbc11.jar in NiFi's lib directory
MySQL JDBC driver NiFi mysql-connector-j-*.jar in NiFi's lib directory
Oracle DB access Data mart target University lab credentials

Step 1 — Start MySQL Container

Linux / macOS (Docker):

bash docker/start.sh

Linux / macOS (Podman):

bash docker/start.sh --podman

Windows (PowerShell):

.\docker\start.ps1

The script:

  • Creates a named container hotel-mysql with a persistent data volume
  • Mounts sql/schema.sql as an init script — all 13 tables are created automatically on first start
  • Waits until MySQL is ready before exiting

Connection details:

Host:     127.0.0.1
Port:     13306
Database: hotel_reservations
User:     root
Password: hotel2025root

Step 2 — Generate OLTP Data

dotnet run generator/generate.cs

Runtime: ~3 minutes
Output: 635,000+ rows across 13 tables

The generator is deterministic (fixed seed 42) — running it twice on an empty database produces the same data.

Important: Run the generator only once on an empty database. If you need to restart, truncate all tables first (respecting FK order) or drop and recreate the container + volume.

Quick table verification after generation:

# Docker
docker exec hotel-mysql mysql -uroot -photel2025root hotel_reservations \
  -e "SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema='hotel_reservations';"

# Podman
podman exec hotel-mysql mysql -uroot -photel2025root hotel_reservations \
  -e "SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema='hotel_reservations';"

Step 3 — Prepare Oracle Data Mart

Connect to the Oracle schema (university lab) and execute sql/datamart_schema.sql.

The script creates:

  • ETL_WATERMARK (with initial row for FACT_ROOM_BOOKING)
  • STG_HOTEL (staging)
  • All 7 dimension tables
  • FACT_ROOM_BOOKING
-- Run in SQL*Plus or SQL Developer:
@datamart_schema.sql

Step 4 — Configure NiFi

4.1 Add JDBC drivers to NiFi

Copy the following JARs to $NIFI_HOME/lib/ (or the NiFi extensions directory):

  • mysql-connector-j-8.x.jar
  • ojdbc11.jar

Restart NiFi after adding drivers.

4.2 Create Controller Services

In NiFi UI → Controller Settings → Controller Services:

MySQL connection:

  • Type: DBCPConnectionPool
  • Database Driver Class Name: com.mysql.cj.jdbc.Driver
  • Database Connection URL: jdbc:mysql://127.0.0.1:13306/hotel_reservations
  • Database User: root
  • Password: hotel2025root

Oracle connection:

  • Type: DBCPConnectionPool
  • Database Driver Class Name: oracle.jdbc.OracleDriver
  • Database Connection URL: jdbc:oracle:thin:@<host>:1521:<sid>
  • Database User: <your_schema>
  • Password: <your_password>

Enable both services.

4.3 Build Process Groups

Follow the detailed processor configuration in docs/nifi-flow.md.

Recommended build order:

  1. PG-1: Date Dimension (simplest, test first)
  2. PG-2: Static Dimensions (verify MERGE logic)
  3. PG-3: DIM_HOTEL SCD2 (most complex — check staging table after run)
  4. PG-4: DIM_GUEST SCD1
  5. PG-5: Fact Incremental Load

Step 5 — Run ETL

First full load

  1. Run PG-1 (Date Dimension) manually — run once
  2. Start PG-2, PG-3, PG-4 — these are idempotent, safe to re-run
  3. Start PG-5 — runs incrementally; first run loads all 531k room_bookings

Verify load

-- Oracle
SELECT COUNT(*) FROM DIM_HOTEL;        -- should be 200 (+ more after SCD2 changes)
SELECT COUNT(*) FROM DIM_GUEST;        -- 100,000
SELECT COUNT(*) FROM FACT_ROOM_BOOKING; -- 531,382
SELECT last_key FROM ETL_WATERMARK WHERE entity_name = 'FACT_ROOM_BOOKING'; -- 531,382

Verify SCD2 is working

-- Should show 1 current version per hotel on initial load
SELECT is_current, COUNT(*) FROM DIM_HOTEL GROUP BY is_current;
-- Expected: IS_CURRENT=1, COUNT=200

Stop / Restart

Stop MySQL (preserves data):

bash docker/stop.sh [--podman]

Restart MySQL:

bash docker/start.sh [--podman]

Full reset (delete all data):

bash docker/stop.sh --podman
podman volume rm hotel-mysql-data
bash docker/start.sh --podman
dotnet run generator/generate.cs