# Setup Guide ## Prerequisites | Tool | Required for | Notes | |------|-------------|-------| | Docker or Podman | MySQL container | Use `--podman` flag on Linux | | .NET 10 SDK | Data generator | `dotnet run file.cs` support | | Apache NiFi | ETL | Running instance with Oracle + MySQL JDBC drivers | | Oracle JDBC driver | NiFi | `ojdbc11.jar` in NiFi's lib directory | | MySQL JDBC driver | NiFi | `mysql-connector-j-*.jar` in NiFi's lib directory | | Oracle DB access | Data mart target | University lab credentials | --- ## Step 1 — Start MySQL Container **Linux / macOS (Docker):** ```bash bash docker/start.sh ``` **Linux / macOS (Podman):** ```bash bash docker/start.sh --podman ``` **Windows (PowerShell):** ```powershell .\docker\start.ps1 ``` The script: - Creates a named container `hotel-mysql` with a persistent data volume - Mounts `sql/schema.sql` as an init script — all 13 tables are created automatically on first start - Waits until MySQL is ready before exiting **Connection details:** ``` Host: 127.0.0.1 Port: 13306 Database: hotel_reservations User: root Password: hotel2025root ``` --- ## Step 2 — Generate OLTP Data ```bash dotnet run generator/generate.cs ``` **Runtime:** ~3 minutes **Output:** 635,000+ rows across 13 tables The generator is deterministic (fixed seed `42`) — running it twice on an empty database produces the same data. > **Important:** Run the generator only once on an empty database. If you need to restart, truncate all tables first (respecting FK order) or drop and recreate the container + volume. ### Quick table verification after generation: ```bash # Docker docker exec hotel-mysql mysql -uroot -photel2025root hotel_reservations \ -e "SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema='hotel_reservations';" # Podman podman exec hotel-mysql mysql -uroot -photel2025root hotel_reservations \ -e "SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema='hotel_reservations';" ``` --- ## Step 3 — Prepare Oracle Data Mart Connect to the Oracle schema (university lab) and execute `sql/datamart_schema.sql`. The script creates: - `ETL_WATERMARK` (with initial row for `FACT_ROOM_BOOKING`) - `STG_HOTEL` (staging) - All 7 dimension tables - `FACT_ROOM_BOOKING` ```sql -- Run in SQL*Plus or SQL Developer: @datamart_schema.sql ``` --- ## Step 4 — Configure NiFi ### 4.1 Add JDBC drivers to NiFi Copy the following JARs to `$NIFI_HOME/lib/` (or the NiFi extensions directory): - `mysql-connector-j-8.x.jar` - `ojdbc11.jar` Restart NiFi after adding drivers. ### 4.2 Create Controller Services In NiFi UI → Controller Settings → Controller Services: **MySQL connection:** - Type: `DBCPConnectionPool` - Database Driver Class Name: `com.mysql.cj.jdbc.Driver` - Database Connection URL: `jdbc:mysql://127.0.0.1:13306/hotel_reservations` - Database User: `root` - Password: `hotel2025root` **Oracle connection:** - Type: `DBCPConnectionPool` - Database Driver Class Name: `oracle.jdbc.OracleDriver` - Database Connection URL: `jdbc:oracle:thin:@:1521:` - Database User: `` - Password: `` Enable both services. ### 4.3 Build Process Groups Follow the detailed processor configuration in `docs/nifi-flow.md`. **Recommended build order:** 1. PG-1: Date Dimension (simplest, test first) 2. PG-2: Static Dimensions (verify MERGE logic) 3. PG-3: DIM_HOTEL SCD2 (most complex — check staging table after run) 4. PG-4: DIM_GUEST SCD1 5. PG-5: Fact Incremental Load --- ## Step 5 — Run ETL ### First full load 1. Run **PG-1** (Date Dimension) manually — run once 2. Start **PG-2, PG-3, PG-4** — these are idempotent, safe to re-run 3. Start **PG-5** — runs incrementally; first run loads all 531k room_bookings ### Verify load ```sql -- Oracle SELECT COUNT(*) FROM DIM_HOTEL; -- should be 200 (+ more after SCD2 changes) SELECT COUNT(*) FROM DIM_GUEST; -- 100,000 SELECT COUNT(*) FROM FACT_ROOM_BOOKING; -- 531,382 SELECT last_key FROM ETL_WATERMARK WHERE entity_name = 'FACT_ROOM_BOOKING'; -- 531,382 ``` ### Verify SCD2 is working ```sql -- Should show 1 current version per hotel on initial load SELECT is_current, COUNT(*) FROM DIM_HOTEL GROUP BY is_current; -- Expected: IS_CURRENT=1, COUNT=200 ``` --- ## Stop / Restart **Stop MySQL (preserves data):** ```bash bash docker/stop.sh [--podman] ``` **Restart MySQL:** ```bash bash docker/start.sh [--podman] ``` **Full reset (delete all data):** ```bash bash docker/stop.sh --podman podman volume rm hotel-mysql-data bash docker/start.sh --podman dotnet run generator/generate.cs ```