Compare commits

..

2 Commits

Author SHA1 Message Date
946e4020d9 prepravljena data mart sema i generator jebemliga 2026-05-19 15:46:11 +02:00
571c749e25 docs 2026-05-17 21:27:42 +02:00
7 changed files with 1057 additions and 173 deletions

98
docs/01-overview.md Normal file
View File

@@ -0,0 +1,98 @@
# Hotel Reservations — Data Warehouse Project
## Project Summary
This project implements a complete **Data Warehousing pipeline** for a hotel reservation system, covering all standard DW layers:
```
MySQL OLTP ──► Apache NiFi ETL ──► Oracle Data Mart ──► Power BI Reports
(source) (transform) (analytical store) (OLAP queries)
```
The system is built around the **A.24 Hotel Reservations** domain from the course specification. The OLTP database was populated with **~635,000 synthetically generated rows** covering 200 hotels, 100,000 guests, 500,000 bookings, and 531,000 room bookings across a 4-year period (20222025).
---
## Business Context
A hotel chain needs to answer questions like:
- Which countries generate the most revenue per quarter?
- How does occupancy differ between peak and off-peak seasons?
- What is the revenue contribution of 5-star vs 3-star hotels?
- How has a hotel's revenue changed after upgrading its star rating?
These questions require **historical, multi-dimensional analysis** that a normalized OLTP database cannot serve efficiently. The data mart provides pre-modelled, denormalized data optimized for analytical queries.
---
## Architecture
```
┌─────────────────────────────────────────────────────────┐
│ SOURCE LAYER │
│ MySQL 8.4 (Docker/Podman, port 13306) │
│ Database: hotel_reservations │
│ 13 normalized tables, ~635K rows │
└───────────────────────┬─────────────────────────────────┘
│ JDBC (MySqlConnector)
┌─────────────────────────────────────────────────────────┐
│ ETL LAYER │
│ Apache NiFi │
│ 5 Process Groups: Date Dim / Static Dims / │
│ SCD2 Hotel / SCD1 Guest / Incremental Fact │
└───────────────────────┬─────────────────────────────────┘
│ JDBC (Oracle JDBC)
┌─────────────────────────────────────────────────────────┐
│ DATA MART LAYER │
│ Oracle (university lab schema) │
│ Star schema: 6 dimensions + 1 fact table │
│ SCD Type 2 on DIM_HOTEL │
└───────────────────────┬─────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ PRESENTATION LAYER │
│ Power BI Desktop │
│ OLAP reports via DirectQuery / Import │
└─────────────────────────────────────────────────────────┘
```
---
## Technology Stack
| Component | Technology | Version |
|-----------|-----------|---------|
| OLTP Database | MySQL | 8.4 |
| Container runtime | Docker / Podman | — |
| Data generator | C# (.NET) | 10 |
| ETL tool | Apache NiFi | — |
| Data Mart | Oracle RDBMS | university lab |
| Reporting | Power BI Desktop | — |
---
## Repository Structure
```
IPZ_1/
├── docker/
│ ├── start.sh # Start MySQL container (Linux/macOS)
│ ├── stop.sh # Stop MySQL container
│ ├── start.ps1 # Start MySQL container (Windows)
│ └── stop.ps1 # Stop MySQL container
├── sql/
│ ├── schema.sql # MySQL OLTP DDL
│ └── datamart_schema.sql # Oracle Data Mart DDL
├── generator/
│ └── generate.cs # .NET 10 data generator script
└── docs/
├── 01-overview.md # This file
├── 02-oltp.md # OLTP database design
├── 03-datamart.md # Data mart design
├── 04-setup.md # Setup and run guide
└── nifi-flow.md # NiFi ETL flow reference
```

258
docs/02-oltp.md Normal file
View File

@@ -0,0 +1,258 @@
# OLTP Database — Design & Details
## Overview
The OLTP (Online Transaction Processing) database models a **hotel reservation system** using a fully normalized relational schema in **MySQL 8.4**. It follows 3NF and enforces referential integrity via foreign keys.
- **Database:** `hotel_reservations`
- **Character set:** `utf8mb4` / `utf8mb4_unicode_ci`
- **Tables:** 13
- **Total rows:** ~635,000
---
## Entity-Relationship Model
The schema covers five entity groups:
```
hotel_chain ──┐
country ───────┼──► hotel ──► hotel_room ──► room_booking ──► booking ──► guest
star_rating ──┘ │
└──► country
hotel_characteristic ◄──► hotel (M:N via hotel_hotel_characteristic)
room_type ◄──── hotel_room
room_type ◄──┐
rate_period ◄─┴── period_room_rate (price per room type per season)
```
---
## Table Descriptions
### Reference / Lookup Tables
#### `hotel_chain`
International hotel chains (Hilton, Marriott, Accor, etc.).
| Column | Type | Description |
|--------|------|-------------|
| `hotel_chain_id` | INT UNSIGNED PK | Surrogate key |
| `code` | VARCHAR(10) UNIQUE | Short code (e.g. `HLT`) |
| `name` | VARCHAR(100) | Full name |
**Rows:** 10
---
#### `country`
Countries from which guests come and where hotels are located.
| Column | Type | Description |
|--------|------|-------------|
| `country_id` | INT UNSIGNED PK | Surrogate key |
| `code` | CHAR(2) UNIQUE | ISO 3166-1 alpha-2 (e.g. `GB`) |
| `name` | VARCHAR(100) | Country name |
| `currency` | VARCHAR(10) | ISO currency code (e.g. `EUR`) |
**Rows:** 40 (Europe, Americas, Asia, Africa, Oceania)
---
#### `star_rating`
Hotel classification from 1★ to 5★.
| Column | Type | Description |
|--------|------|-------------|
| `star_rating_id` | INT UNSIGNED PK | Surrogate key |
| `code` | TINYINT UNIQUE | 15 |
| `description` | VARCHAR(20) | e.g. `3 Star` |
**Rows:** 5
---
#### `hotel_characteristic`
Amenities and features a hotel may offer.
| Column | Type | Description |
|--------|------|-------------|
| `characteristic_id` | INT UNSIGNED PK | Surrogate key |
| `code` | VARCHAR(20) UNIQUE | e.g. `POOL`, `SPA`, `WIFI` |
| `description` | VARCHAR(100) | Human-readable label |
**Rows:** 12 (WiFi, Pool, Gym, Spa, Restaurant, Bar, Parking, Valet, Conference, Shuttle, Room Service, Pet Friendly)
---
#### `room_type`
Types of rooms a hotel can offer, with a standard (base) rate.
| Column | Type | Description |
|--------|------|-------------|
| `room_type_id` | INT UNSIGNED PK | Surrogate key |
| `code` | VARCHAR(20) UNIQUE | e.g. `SINGLE`, `SUITE` |
| `description` | VARCHAR(100) | e.g. `Junior Suite` |
| `standard_rate` | DECIMAL(10,2) | Base nightly rate (EUR) |
| `smoking_yn` | BOOLEAN | Smoking allowed flag |
**Rows:** 7 (Single €80, Double €120, Twin €115, Deluxe €180, Suite €280, Executive €450, Family €200)
---
#### `rate_period`
Seasonal pricing periods. Each period maps to a month range and applies a rate multiplier.
| Column | Type | Description |
|--------|------|-------------|
| `rate_period_id` | INT UNSIGNED PK | Surrogate key |
| `code` | VARCHAR(20) UNIQUE | e.g. `PEAK`, `WINTER` |
| `description` | VARCHAR(50) | Human-readable label |
| `month_from` | TINYINT | Start month (112) |
| `month_to` | TINYINT | End month (112) |
**Rows:** 4
| Code | Period | Months | Multiplier |
|------|--------|--------|-----------|
| PEAK | Peak Season | JunAug | ×1.5 |
| HIGH | High Season | MarMay | ×1.2 |
| AUTUMN | Autumn Season | SepNov | ×1.1 |
| WINTER | Winter Season | DecFeb | ×0.9 |
---
### Junction Tables
#### `period_room_rate`
The effective nightly rate for each (room_type, rate_period) combination.
Rate = `standard_rate × season_multiplier`.
| Column | Type | Description |
|--------|------|-------------|
| `room_type_id` | INT UNSIGNED PK/FK | |
| `rate_period_id` | INT UNSIGNED PK/FK | |
| `rate` | DECIMAL(10,2) | Effective nightly rate |
**Rows:** 28 (7 room types × 4 seasons)
---
#### `hotel_hotel_characteristic`
M:N junction between hotels and their amenities.
| Column | Type |
|--------|------|
| `hotel_id` | INT UNSIGNED PK/FK |
| `characteristic_id` | INT UNSIGNED PK/FK |
**Rows:** ~1,415
---
### Core Entity Tables
#### `hotel`
Individual hotel properties.
| Column | Type | Description |
|--------|------|-------------|
| `hotel_id` | INT UNSIGNED PK | |
| `hotel_chain_id` | INT UNSIGNED FK | NULL for independent hotels |
| `country_id` | INT UNSIGNED FK | |
| `star_rating_id` | INT UNSIGNED FK | |
| `code` | VARCHAR(20) UNIQUE | e.g. `HTL0001` |
| `name` | VARCHAR(150) | |
| `address` | VARCHAR(200) | |
| `postcode` | VARCHAR(20) | |
| `city` | VARCHAR(100) | |
| `url` | VARCHAR(200) | |
**Rows:** 200 (50 cities, star distribution: 5% 1★, 10% 2★, 40% 3★, 30% 4★, 15% 5★)
---
#### `hotel_room`
Individual rooms within each hotel.
| Column | Type | Description |
|--------|------|-------------|
| `room_id` | INT UNSIGNED PK | |
| `hotel_id` | INT UNSIGNED FK | |
| `room_type_id` | INT UNSIGNED FK | |
| `room_number` | VARCHAR(10) | Format: `{floor}{number}`, e.g. `101` |
| `floor` | TINYINT UNSIGNED | |
**Rows:** 5,334 (560 rooms per hotel depending on star rating)
---
#### `guest`
Hotel guests.
| Column | Type | Description |
|--------|------|-------------|
| `guest_id` | INT UNSIGNED PK | |
| `country_id` | INT UNSIGNED FK | Guest's home country |
| `name` | VARCHAR(150) | Full name |
| `email` | VARCHAR(150) | Unique synthetic email |
| `address` | VARCHAR(200) | |
| `city` | VARCHAR(100) | |
**Rows:** 100,000
---
#### `booking`
A reservation made by a guest at a hotel. One booking can cover multiple rooms.
| Column | Type | Description |
|--------|------|-------------|
| `booking_id` | INT UNSIGNED PK | |
| `guest_id` | INT UNSIGNED FK | |
| `hotel_id` | INT UNSIGNED FK | |
| `date_from` | DATE | Check-in |
| `date_to` | DATE | Check-out |
| `status` | ENUM | `confirmed`, `cancelled`, `completed`, `no_show` |
| `created_at` | DATETIME | When booking was made |
**Rows:** 500,000
**Status distribution:** 80% completed, 10% confirmed, 7% cancelled, 3% no_show
**Date range:** 2022-01-01 2025-12-31
**Seasonal distribution:** JuneAugust heaviest (peak), DecemberFebruary lightest
---
#### `room_booking`
A specific room assigned within a booking. Stores the rate **as it was at booking time** (snapshot), independent of any future rate changes.
| Column | Type | Description |
|--------|------|-------------|
| `room_booking_id` | INT UNSIGNED PK | |
| `booking_id` | INT UNSIGNED FK | |
| `room_id` | INT UNSIGNED FK | |
| `date_from` | DATE | |
| `date_to` | DATE | |
| `nightly_rate` | DECIMAL(10,2) | Rate at time of booking |
| `total_amount` | DECIMAL(10,2) | `nightly_rate × nights` |
**Rows:** 531,382
**Room count per booking:** 90% single room, 8% two rooms, 2% three rooms
---
## Data Generation
The database was populated using a **single-file C# script** (`generator/generate.cs`) running on .NET 10, using `MySqlConnector` as the only dependency.
Key generation decisions:
- **Seasonal booking distribution** via rejection sampling — months JunAug are ~2.7× more likely than JanFeb
- **Rate snapshot** — each `room_booking.nightly_rate` is looked up from `period_room_rate` at insert time and stored, not re-computed later
- **Realistic stay lengths** — 30% one night, 25% two nights, 20% three nights, tapering off to 14-night stays
- **Cancelled/no-show bookings** partially skip room assignment (60% of cancellations have no room_booking)
```bash
# Run generator
dotnet run generator/generate.cs
```

255
docs/03-datamart.md Normal file
View File

@@ -0,0 +1,255 @@
# Data Mart — Design & Details
## Overview
The data mart uses a **star schema** stored in an Oracle database (university lab schema). It is optimized for analytical queries against hotel reservation data — revenue analysis, occupancy trends, seasonal patterns, and guest origin breakdowns.
- **Schema type:** Star schema
- **Dimensions:** 6 (+ date dimension)
- **Fact table:** `FACT_ROOM_BOOKING`
- **Grain:** One row per room_booking (one room, one stay)
- **SCD strategy:** Type 2 on DIM_HOTEL, Type 1 on all others
---
## Star Schema Diagram
```
DIM_DATE
(date_key)
┌───────────┴───────────┐
│ checkin / checkout │
│ │
DIM_HOTEL_CHAIN ◄─ DIM_HOTEL ─► DIM_STAR_RATING
│ │
│ FACT_ROOM_BOOKING ◄──── DIM_ROOM
│ │
└───────► DIM_COUNTRY ◄───── DIM_GUEST
```
---
## Dimension Tables
### DIM_DATE
Populated once for the range 20202030. Used for both check-in and check-out date lookups.
| Column | Type | Description |
|--------|------|-------------|
| `date_key` | NUMBER(8) PK | YYYYMMDD integer key |
| `full_date` | DATE | Actual date value |
| `year` | NUMBER(4) | |
| `quarter` | NUMBER(1) | 14 |
| `month` | NUMBER(2) | 112 |
| `month_name` | VARCHAR2(10) | e.g. `January` |
| `week_number` | NUMBER(2) | ISO week number |
| `day_of_month` | NUMBER(2) | |
| `day_name` | VARCHAR2(10) | e.g. `Monday` |
| `is_weekend` | NUMBER(1) | 0/1 |
| `is_business_day` | NUMBER(1) | 0/1 |
| `season` | VARCHAR2(10) | Peak / High / Autumn / Winter |
Using an integer date key (YYYYMMDD) instead of a DATE FK allows efficient range predicates: `checkin_date_key BETWEEN 20240601 AND 20240831`.
---
### DIM_COUNTRY (SCD Type 1)
Country attributes are stable. If a name or currency ever changes, the row is simply overwritten (no history needed).
| Column | Type | Description |
|--------|------|-------------|
| `country_key` | NUMBER(10) PK | Surrogate (IDENTITY) |
| `country_id` | NUMBER(10) UNIQUE | Natural key from MySQL |
| `code` | CHAR(2) | ISO alpha-2 |
| `name` | VARCHAR2(100) | |
| `currency` | VARCHAR2(10) | ISO currency code |
---
### DIM_STAR_RATING (SCD Type 1)
Static lookup. Star rating codes 15 never change.
| Column | Type | Description |
|--------|------|-------------|
| `star_rating_key` | NUMBER(10) PK | Surrogate (IDENTITY) |
| `star_rating_id` | NUMBER(10) UNIQUE | Natural key |
| `code` | NUMBER(1) | 15 |
| `description` | VARCHAR2(20) | e.g. `4 Star` |
---
### DIM_HOTEL_CHAIN (SCD Type 1)
Chain name/code may be updated (e.g. corporate rebranding), but we do not need a historical record of chain name changes.
| Column | Type | Description |
|--------|------|-------------|
| `hotel_chain_key` | NUMBER(10) PK | Surrogate (IDENTITY) |
| `hotel_chain_id` | NUMBER(10) UNIQUE | Natural key |
| `code` | VARCHAR2(10) | e.g. `HLT` |
| `name` | VARCHAR2(100) | |
---
### DIM_HOTEL (SCD Type 2)
This is the most analytically significant dimension and the only one implemented as **Slowly Changing Dimension Type 2**.
**Why SCD Type 2 here?**
A hotel's star rating or chain affiliation can change over time — a property gets renovated and reclassified from 3★ to 4★, or switches from one international chain to another. These changes directly affect revenue analysis: a 3★ hotel charges different rates than a 4★ hotel, and grouping all historical bookings under the current star rating would produce misleading averages.
SCD Type 2 preserves history by creating a **new row** for each version of a hotel, while expiring the old row with an `expiry_date`. The fact table's `hotel_key` always points to the version that was active **at check-in date**, never to the current version if it changed.
| Column | Type | Description |
|--------|------|-------------|
| `hotel_key` | NUMBER(10) PK | Surrogate (IDENTITY) |
| `source_hotel_id` | NUMBER(10) | Natural key from MySQL |
| `hotel_chain_key` | NUMBER(10) FK | NULL for independent hotels |
| `country_key` | NUMBER(10) FK | |
| `star_rating_key` | NUMBER(10) FK | |
| `code` | VARCHAR2(20) | |
| `name` | VARCHAR2(150) | |
| `city` | VARCHAR2(100) | |
| `effective_date` | DATE | When this version became active |
| `expiry_date` | DATE | When this version was superseded (NULL = current) |
| `is_current` | NUMBER(1) | 1 = current version |
**SCD2 example:**
| hotel_key | source_hotel_id | star_rating | effective_date | expiry_date | is_current |
|-----------|----------------|-------------|----------------|-------------|-----------|
| 1 | 42 | 3★ | 2022-01-01 | 2024-05-31 | 0 |
| 2 | 42 | 4★ | 2024-06-01 | NULL | 1 |
Bookings from 20222024 point to `hotel_key=1`, bookings from 2024 onward point to `hotel_key=2`. Revenue by star category remains historically correct.
---
### DIM_ROOM (SCD Type 1)
Room type is stable for our dataset. Updated via MERGE if room details ever change.
| Column | Type | Description |
|--------|------|-------------|
| `room_key` | NUMBER(10) PK | Surrogate (IDENTITY) |
| `room_id` | NUMBER(10) UNIQUE | Natural key |
| `hotel_key` | NUMBER(10) FK | Points to current DIM_HOTEL version |
| `room_number` | VARCHAR2(10) | |
| `floor` | NUMBER(3) | |
| `room_type_code` | VARCHAR2(20) | e.g. `SUITE` |
| `room_type_desc` | VARCHAR2(100) | |
| `smoking_yn` | NUMBER(1) | |
| `standard_rate` | NUMBER(10,2) | Base rate from OLTP |
---
### DIM_GUEST (SCD Type 1)
Guest personal data (city, country) may change, but tracking historical addresses has no analytical value for this domain. MERGE (upsert) is used.
| Column | Type | Description |
|--------|------|-------------|
| `guest_key` | NUMBER(10) PK | Surrogate (IDENTITY) |
| `guest_id` | NUMBER(10) UNIQUE | Natural key |
| `country_key` | NUMBER(10) FK | Home country |
| `name` | VARCHAR2(150) | |
| `city` | VARCHAR2(100) | |
---
## Fact Table: FACT_ROOM_BOOKING
**Grain:** One row per room_booking — one specific room, for one stay.
| Column | Type | Description |
|--------|------|-------------|
| `fact_id` | NUMBER(10) PK | Surrogate (IDENTITY) |
| `source_rb_id` | NUMBER(10) UNIQUE | Natural key — used for idempotent incremental loads |
| `hotel_key` | NUMBER(10) FK | SCD2-resolved hotel version at check-in |
| `hotel_chain_key` | NUMBER(10) FK | Denormalized from DIM_HOTEL for convenience |
| `room_key` | NUMBER(10) FK | |
| `guest_key` | NUMBER(10) FK | |
| `country_key` | NUMBER(10) FK | Guest's country — denormalized |
| `star_rating_key` | NUMBER(10) FK | Denormalized from DIM_HOTEL for convenience |
| `checkin_date_key` | NUMBER(8) FK | YYYYMMDD |
| `checkout_date_key` | NUMBER(8) FK | YYYYMMDD |
| `booking_status` | VARCHAR2(20) | Degenerate dimension: confirmed/completed/cancelled/no_show |
| `nights_stayed` | NUMBER(4) | checkout checkin in days |
| `nightly_rate` | NUMBER(10,2) | Rate per night at time of booking |
| `total_amount` | NUMBER(12,2) | `nightly_rate × nights_stayed` |
### Measures
| Measure | Type | Aggregation |
|---------|------|-------------|
| `nights_stayed` | Additive | SUM, AVG |
| `nightly_rate` | Semi-additive | AVG (not SUM — rate doesn't add across rooms meaningfully) |
| `total_amount` | Additive | SUM (main revenue measure) |
### Degenerate Dimensions
`booking_status` is stored directly on the fact row. Splitting it into a separate dimension table would add a table with only 4 rows and no other attributes — not worth the JOIN overhead.
---
## ETL Control Tables
### ETL_WATERMARK
Tracks the highest `room_booking_id` already loaded into the fact table, enabling incremental loads without re-reading the entire source.
| Column | Description |
|--------|-------------|
| `entity_name` | Logical entity name (e.g. `FACT_ROOM_BOOKING`) |
| `last_key` | Highest PK value loaded so far |
| `last_run_ts` | Timestamp of the last ETL run |
### STG_HOTEL
Staging table used by the SCD2 ETL process. NiFi loads raw hotel data from MySQL here, then SQL applies the expire-and-insert SCD2 logic in a single transaction. Truncated at the start of each ETL run.
---
## Sample Analytical Queries
### Revenue by country and quarter
```sql
SELECT
c.name AS country,
d.year,
d.quarter,
SUM(f.total_amount) AS revenue,
COUNT(*) AS room_nights
FROM FACT_ROOM_BOOKING f
JOIN DIM_DATE d ON d.date_key = f.checkin_date_key
JOIN DIM_GUEST g ON g.guest_key = f.guest_key
JOIN DIM_COUNTRY c ON c.country_key = g.country_key
WHERE f.booking_status = 'completed'
GROUP BY c.name, d.year, d.quarter
ORDER BY revenue DESC;
```
### Average revenue per star category (correct because of SCD2)
```sql
SELECT
sr.code AS stars,
d.season,
AVG(f.nightly_rate) AS avg_nightly_rate,
SUM(f.total_amount) AS total_revenue
FROM FACT_ROOM_BOOKING f
JOIN DIM_HOTEL h ON h.hotel_key = f.hotel_key
JOIN DIM_STAR_RATING sr ON sr.star_rating_key = f.star_rating_key
JOIN DIM_DATE d ON d.date_key = f.checkin_date_key
GROUP BY sr.code, d.season
ORDER BY sr.code, d.season;
```
### Top 10 cities by occupancy (room-nights)
```sql
SELECT
h.city,
SUM(f.nights_stayed) AS room_nights,
SUM(f.total_amount) AS revenue
FROM FACT_ROOM_BOOKING f
JOIN DIM_HOTEL h ON h.hotel_key = f.hotel_key
WHERE f.booking_status IN ('completed','confirmed')
GROUP BY h.city
ORDER BY room_nights DESC
FETCH FIRST 10 ROWS ONLY;
```

181
docs/04-setup.md Normal file
View File

@@ -0,0 +1,181 @@
# Setup Guide
## Prerequisites
| Tool | Required for | Notes |
|------|-------------|-------|
| Docker or Podman | MySQL container | Use `--podman` flag on Linux |
| .NET 10 SDK | Data generator | `dotnet run file.cs` support |
| Apache NiFi | ETL | Running instance with Oracle + MySQL JDBC drivers |
| Oracle JDBC driver | NiFi | `ojdbc11.jar` in NiFi's lib directory |
| MySQL JDBC driver | NiFi | `mysql-connector-j-*.jar` in NiFi's lib directory |
| Oracle DB access | Data mart target | University lab credentials |
---
## Step 1 — Start MySQL Container
**Linux / macOS (Docker):**
```bash
bash docker/start.sh
```
**Linux / macOS (Podman):**
```bash
bash docker/start.sh --podman
```
**Windows (PowerShell):**
```powershell
.\docker\start.ps1
```
The script:
- Creates a named container `hotel-mysql` with a persistent data volume
- Mounts `sql/schema.sql` as an init script — all 13 tables are created automatically on first start
- Waits until MySQL is ready before exiting
**Connection details:**
```
Host: 127.0.0.1
Port: 13306
Database: hotel_reservations
User: root
Password: hotel2025root
```
---
## Step 2 — Generate OLTP Data
```bash
dotnet run generator/generate.cs
```
**Runtime:** ~3 minutes
**Output:** 635,000+ rows across 13 tables
The generator is deterministic (fixed seed `42`) — running it twice on an empty database produces the same data.
> **Important:** Run the generator only once on an empty database. If you need to restart, truncate all tables first (respecting FK order) or drop and recreate the container + volume.
### Quick table verification after generation:
```bash
# Docker
docker exec hotel-mysql mysql -uroot -photel2025root hotel_reservations \
-e "SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema='hotel_reservations';"
# Podman
podman exec hotel-mysql mysql -uroot -photel2025root hotel_reservations \
-e "SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema='hotel_reservations';"
```
---
## Step 3 — Prepare Oracle Data Mart
Connect to the Oracle schema (university lab) and execute `sql/datamart_schema.sql`.
The script creates:
- `ETL_WATERMARK` (with initial row for `FACT_ROOM_BOOKING`)
- `STG_HOTEL` (staging)
- All 7 dimension tables
- `FACT_ROOM_BOOKING`
```sql
-- Run in SQL*Plus or SQL Developer:
@datamart_schema.sql
```
---
## Step 4 — Configure NiFi
### 4.1 Add JDBC drivers to NiFi
Copy the following JARs to `$NIFI_HOME/lib/` (or the NiFi extensions directory):
- `mysql-connector-j-8.x.jar`
- `ojdbc11.jar`
Restart NiFi after adding drivers.
### 4.2 Create Controller Services
In NiFi UI → Controller Settings → Controller Services:
**MySQL connection:**
- Type: `DBCPConnectionPool`
- Database Driver Class Name: `com.mysql.cj.jdbc.Driver`
- Database Connection URL: `jdbc:mysql://127.0.0.1:13306/hotel_reservations`
- Database User: `root`
- Password: `hotel2025root`
**Oracle connection:**
- Type: `DBCPConnectionPool`
- Database Driver Class Name: `oracle.jdbc.OracleDriver`
- Database Connection URL: `jdbc:oracle:thin:@<host>:1521:<sid>`
- Database User: `<your_schema>`
- Password: `<your_password>`
Enable both services.
### 4.3 Build Process Groups
Follow the detailed processor configuration in `docs/nifi-flow.md`.
**Recommended build order:**
1. PG-1: Date Dimension (simplest, test first)
2. PG-2: Static Dimensions (verify MERGE logic)
3. PG-3: DIM_HOTEL SCD2 (most complex — check staging table after run)
4. PG-4: DIM_GUEST SCD1
5. PG-5: Fact Incremental Load
---
## Step 5 — Run ETL
### First full load
1. Run **PG-1** (Date Dimension) manually — run once
2. Start **PG-2, PG-3, PG-4** — these are idempotent, safe to re-run
3. Start **PG-5** — runs incrementally; first run loads all 531k room_bookings
### Verify load
```sql
-- Oracle
SELECT COUNT(*) FROM DIM_HOTEL; -- should be 200 (+ more after SCD2 changes)
SELECT COUNT(*) FROM DIM_GUEST; -- 100,000
SELECT COUNT(*) FROM FACT_ROOM_BOOKING; -- 531,382
SELECT last_key FROM ETL_WATERMARK WHERE entity_name = 'FACT_ROOM_BOOKING'; -- 531,382
```
### Verify SCD2 is working
```sql
-- Should show 1 current version per hotel on initial load
SELECT is_current, COUNT(*) FROM DIM_HOTEL GROUP BY is_current;
-- Expected: IS_CURRENT=1, COUNT=200
```
---
## Stop / Restart
**Stop MySQL (preserves data):**
```bash
bash docker/stop.sh [--podman]
```
**Restart MySQL:**
```bash
bash docker/start.sh [--podman]
```
**Full reset (delete all data):**
```bash
bash docker/stop.sh --podman
podman volume rm hotel-mysql-data
bash docker/start.sh --podman
dotnet run generator/generate.cs
```

83
docs/05-conclusion.md Normal file
View File

@@ -0,0 +1,83 @@
# Conclusion
## What Was Built
This project delivers a complete, working **Data Warehouse pipeline** for the Hotel Reservations domain:
| Layer | What was built | Scale |
|-------|---------------|-------|
| OLTP | MySQL 8.4, 13-table normalized schema | ~635,000 rows |
| Data generation | .NET 10 C# script, realistic seasonal distribution | 500K bookings in ~3 min |
| ETL | Apache NiFi, 5 process groups | full + incremental loads |
| Data Mart | Oracle star schema, SCD Type 2 on DIM_HOTEL | 1 fact + 6 dims |
---
## Design Decisions
### Synthetic data generation instead of a Kaggle dataset
The decision to generate data rather than use a pre-existing dataset was deliberate. Publicly available hotel datasets are either too small (thousands of rows) or lack the normalized relational structure needed to demonstrate a realistic OLTP-to-DW pipeline. The generator produces statistically realistic data:
- Seasonal booking distribution (summer peak, winter trough)
- Realistic stay-length distribution (30% one-night stays)
- Varied status distribution (80% completed, 10% confirmed, 7% cancelled, 3% no-show)
- Revenue rates tied to actual seasonal pricing periods
### SCD Type 2 on DIM_HOTEL only
SCD Type 2 adds operational complexity — it requires staging tables, a two-phase SQL update, and SCD2-aware fact inserts. Applying it to every dimension would make the ETL unnecessarily complex for the analytical benefit gained.
DIM_HOTEL is the right candidate because:
- Star rating changes (3★→4★ after renovation) directly affect revenue benchmarks
- Chain affiliation changes (hotel joins or leaves a franchise) affect chain-level reporting
- Tracking these historically is the core value proposition of dimensional modelling
Guests, countries, room types, and hotel chains all change rarely or in ways that don't affect historical analysis — SCD Type 1 (overwrite) is appropriate.
### Watermark-based incremental fact loading
The fact table uses `source_rb_id` (the MySQL `room_booking_id`) as a natural key and applies a `NOT EXISTS` guard on every insert. Combined with the `ETL_WATERMARK` table, this makes PG-5 both **incremental** (only processes new rows) and **idempotent** (safe to re-run without creating duplicates). This pattern is production-standard and would scale cleanly to a real operational system.
### Integer date keys in DIM_DATE
`date_key` is stored as `NUMBER(8)` in YYYYMMDD format rather than a FK to a DATE column. This allows:
- Fast range predicates: `WHERE checkin_date_key BETWEEN 20240601 AND 20240831`
- No JOIN to get the date value when it's used directly in GROUP BY
- Human-readable values in query results without formatting
---
## Analytical Capabilities
The data mart enables the following categories of OLAP queries:
**Revenue analysis:**
- Total revenue by country, city, hotel chain, star category
- Revenue trend over time (monthly, quarterly, yearly)
- Revenue split by booking status and room type
**Occupancy analysis:**
- Room-nights sold per hotel, per season
- Average stay duration by guest country
- Cancellation rates by period and hotel category
**SCD2-specific analysis:**
- Compare revenue performance of hotels before and after star rating upgrade
- Identify which hotel version (chain affiliation) was more profitable
**Guest origin analysis:**
- Which countries generate the most bookings and revenue
- Cross-country booking patterns (guest country vs hotel country)
---
## Limitations and Possible Extensions
| Limitation | Possible extension |
|------------|-------------------|
| Static OLTP data (no live updates) | Add a NiFi timer to simulate ongoing bookings |
| No SCD2 on DIM_ROOM | Add room type tracking for renovation analysis |
| Single fact table | Add a second fact table for daily hotel occupancy (snapshot fact) |
| No data quality checks in NiFi | Add RouteOnAttribute + dead-letter queue for failed records |
| Oracle target is university lab | Package with Oracle XE Docker container for self-contained demo |

View File

@@ -1,6 +1,7 @@
#:package MySqlConnector@2.3.7 #:package MySqlConnector@2.3.7
using System.Text; using System.Text;
using System.Globalization;
using MySqlConnector; using MySqlConnector;
// ── Config ──────────────────────────────────────────────────────────────────── // ── Config ────────────────────────────────────────────────────────────────────
@@ -45,7 +46,7 @@ async Task BulkInsert(string table, string columns, List<string> valueTuples)
} }
string S(string? s) => s == null ? "NULL" : $"'{s.Replace("'", "''")}'"; string S(string? s) => s == null ? "NULL" : $"'{s.Replace("'", "''")}'";
string N(object? n) => n == null ? "NULL" : n.ToString()!; string N(object? n) => n == null ? "NULL" : (n is IFormattable f) ? f.ToString(null, CultureInfo.InvariantCulture) : n.ToString()!;
string D(DateTime d) => $"'{d:yyyy-MM-dd}'"; string D(DateTime d) => $"'{d:yyyy-MM-dd}'";
string DT(DateTime d) => $"'{d:yyyy-MM-dd HH:mm:ss}'"; string DT(DateTime d) => $"'{d:yyyy-MM-dd HH:mm:ss}'";
@@ -160,7 +161,7 @@ var roomTypes = new (string Code, string Desc, decimal BaseRate, bool Smoking)[]
}; };
await BulkInsert("room_type", "code, description, standard_rate, smoking_yn", await BulkInsert("room_type", "code, description, standard_rate, smoking_yn",
roomTypes.Select(rt => $"({S(rt.Code)},{S(rt.Desc)},{rt.BaseRate},0)").ToList()); roomTypes.Select(rt => $"({S(rt.Code)},{S(rt.Desc)},{N(rt.BaseRate)},0)").ToList());
var roomTypeIds = new Dictionary<string, int>(); var roomTypeIds = new Dictionary<string, int>();
{ {
@@ -194,7 +195,7 @@ foreach (var rt in roomTypes)
foreach (var rp in ratePeriods) foreach (var rp in ratePeriods)
{ {
var rate = Math.Round(rt.BaseRate * rp.Multiplier, 2); var rate = Math.Round(rt.BaseRate * rp.Multiplier, 2);
prrRows.Add($"({roomTypeIds[rt.Code]},{ratePeriodIds[rp.Code]},{rate})"); prrRows.Add($"({roomTypeIds[rt.Code]},{ratePeriodIds[rp.Code]},{N(rate)})");
} }
await BulkInsert("period_room_rate", "room_type_id, rate_period_id, rate", prrRows); await BulkInsert("period_room_rate", "room_type_id, rate_period_id, rate", prrRows);
@@ -514,7 +515,7 @@ while (bookingsDone < BOOKING_COUNT)
int ratePeriodId = monthToRatePeriodId[dfrom.Month]; int ratePeriodId = monthToRatePeriodId[dfrom.Month];
decimal nightly = rateMap[(roomTypeId, ratePeriodId)]; decimal nightly = rateMap[(roomTypeId, ratePeriodId)];
decimal total = Math.Round(nightly * nights, 2); decimal total = Math.Round(nightly * nights, 2);
roomBookingRows.Add($"({bookingId},{roomId},{D(dfrom)},{D(dto)},{nightly},{total})"); roomBookingRows.Add($"({bookingId},{roomId},{D(dfrom)},{D(dto)},{N(nightly)},{N(total)})");
} }
} }
} }

View File

@@ -1,179 +1,187 @@
-- ============================================================================= create table ETL_WATERMARK
-- HOTEL RESERVATIONS — DATA MART (STAR SCHEMA) (
-- Target: Oracle (university lab schema) ENTITY_NAME VARCHAR2(50) not null
-- ============================================================================= constraint PK_ETL_WATERMARK
primary key,
LAST_KEY NUMBER(20) default 0 not null,
LAST_RUN_TS TIMESTAMP(6) default SYSTIMESTAMP
)
/
-- ----------------------------------------------------------------------------- create table STG_HOTEL
-- ETL CONTROL TABLE (
-- Tracks incremental load watermarks per entity. HOTEL_ID NUMBER(10) not null,
-- ----------------------------------------------------------------------------- HOTEL_CODE VARCHAR2(20) not null,
HOTEL_NAME VARCHAR2(150) not null,
CITY VARCHAR2(100) not null,
COUNTRY_CODE CHAR(2) not null,
COUNTRY_NAME VARCHAR2(100) not null,
CURRENCY VARCHAR2(10) not null,
CHAIN_CODE VARCHAR2(10),
CHAIN_NAME VARCHAR2(100),
STAR_RATING NUMBER(1) not null,
STAR_DESCRIPTION VARCHAR2(20)
)
/
CREATE TABLE ETL_WATERMARK ( create table DIM_DATE
entity_name VARCHAR2(50) NOT NULL, (
last_key NUMBER(20,0) DEFAULT 0 NOT NULL, DATE_KEY NUMBER(8) not null
last_run_ts TIMESTAMP DEFAULT SYSTIMESTAMP, constraint PK_DIM_DATE
CONSTRAINT pk_etl_wm PRIMARY KEY (entity_name) primary key,
); FULL_DATE DATE not null,
YEAR NUMBER(4) not null,
QUARTER NUMBER(1) not null,
MONTH NUMBER(2) not null,
MONTH_NAME VARCHAR2(10) not null,
WEEK_NUMBER NUMBER(2) not null,
DAY_OF_MONTH NUMBER(2) not null,
DAY_NAME VARCHAR2(10) not null,
IS_WEEKEND NUMBER(1) not null
constraint CK_DIM_DATE_WEEKEND
check (is_weekend IN (0, 1)),
IS_BUSINESS_DAY NUMBER(1) not null
constraint CK_DIM_DATE_BUSINESS
check (is_business_day IN (0, 1)),
SEASON VARCHAR2(10) not null
)
/
INSERT INTO ETL_WATERMARK (entity_name, last_key) VALUES ('FACT_ROOM_BOOKING', 0); create table DIM_HOTEL
COMMIT; (
HOTEL_KEY NUMBER(10) default "IPZ19438"."ISEQ$$_303891".nextval generated as identity
constraint PK_DIM_HOTEL
primary key,
SOURCE_HOTEL_ID NUMBER(10) not null,
HOTEL_CODE VARCHAR2(20) not null,
HOTEL_NAME VARCHAR2(150) not null,
CITY VARCHAR2(100) not null,
COUNTRY_CODE CHAR(2) not null,
COUNTRY_NAME VARCHAR2(100) not null,
CURRENCY VARCHAR2(10) not null,
CHAIN_CODE VARCHAR2(10),
CHAIN_NAME VARCHAR2(100),
STAR_RATING NUMBER(1) not null,
STAR_DESCRIPTION VARCHAR2(20),
EFFECTIVE_DATE DATE not null,
EXPIRY_DATE DATE,
IS_CURRENT NUMBER(1) default 1 not null
constraint CK_DIM_HOTEL_CURRENT
check (is_current IN (0, 1))
)
/
-- ----------------------------------------------------------------------------- create table DIM_ROOM
-- STAGING TABLES (
-- NiFi loads raw MySQL data here first; SCD logic runs in pure SQL after. ROOM_KEY NUMBER(10) generated as identity
-- Truncated at the start of each ETL run. constraint PK_DIM_ROOM
-- ----------------------------------------------------------------------------- primary key,
SOURCE_ROOM_ID NUMBER(10) not null
constraint UQ_DIM_ROOM
unique,
HOTEL_KEY NUMBER(10) not null
constraint FK_DIM_ROOM_HOTEL
references DIM_HOTEL,
ROOM_NUMBER VARCHAR2(10) not null,
FLOOR NUMBER(3) not null,
ROOM_TYPE_CODE VARCHAR2(20) not null,
ROOM_TYPE_DESCRIPTION VARCHAR2(100) not null,
SMOKING_YN NUMBER(1) not null
constraint CK_DIM_ROOM_SMOKING
check (smoking_yn IN (0, 1)),
STANDARD_RATE NUMBER(10, 2) not null
)
/
CREATE TABLE STG_HOTEL ( create table DIM_GUEST
hotel_id NUMBER(10,0) NOT NULL, (
chain_code VARCHAR2(10), GUEST_KEY NUMBER(10) generated as identity
country_code CHAR(2) NOT NULL, constraint PK_DIM_GUEST
star_code NUMBER(1,0) NOT NULL, primary key,
code VARCHAR2(20) NOT NULL, SOURCE_GUEST_ID NUMBER(10) not null
name VARCHAR2(150) NOT NULL, constraint UQ_DIM_GUEST
city VARCHAR2(100) NOT NULL unique,
); GUEST_NAME VARCHAR2(150) not null,
CITY VARCHAR2(100),
COUNTRY_CODE CHAR(2),
COUNTRY_NAME VARCHAR2(100)
)
/
-- ----------------------------------------------------------------------------- create table FACT_ROOM_BOOKING
-- DIMENSION TABLES (
-- ----------------------------------------------------------------------------- FACT_ID NUMBER(10) default "IPZ19438"."ISEQ$$_303902".nextval generated as identity
constraint PK_FACT_ROOM_BOOKING
primary key,
SOURCE_RB_ID NUMBER(10) not null
constraint UQ_FACT_ROOM_BOOKING_SRC
unique,
HOTEL_KEY NUMBER(10) not null
constraint FK_FACT_HOTEL
references DIM_HOTEL,
ROOM_KEY NUMBER(10) not null
constraint FK_FACT_ROOM
references DIM_ROOM,
GUEST_KEY NUMBER(10) not null
constraint FK_FACT_GUEST
references DIM_GUEST,
BOOKING_CREATED_DATE_KEY NUMBER(8) not null
constraint FK_FACT_BOOKING_DATE
references DIM_DATE,
CHECKIN_DATE_KEY NUMBER(8) not null
constraint FK_FACT_CHECKIN_DATE
references DIM_DATE,
CHECKOUT_DATE_KEY NUMBER(8) not null
constraint FK_FACT_CHECKOUT_DATE
references DIM_DATE,
BOOKING_STATUS VARCHAR2(20) not null,
BOOKING_COUNT NUMBER(1) default 1 not null
constraint CK_FACT_BOOKING_COUNT
check (booking_count = 1),
NIGHTS_STAYED NUMBER(4) not null,
NIGHTLY_RATE NUMBER(10, 2) not null,
TOTAL_AMOUNT NUMBER(12, 2) not null
)
/
-- YYYYMMDD integer key — cheap date range predicates, no JOIN to calendar needed create table STG_GUEST
CREATE TABLE DIM_DATE ( (
date_key NUMBER(8,0) NOT NULL, SOURCE_GUEST_ID NUMBER(10) not null,
full_date DATE NOT NULL, GUEST_NAME VARCHAR2(150) not null,
year NUMBER(4,0) NOT NULL, CITY VARCHAR2(100),
quarter NUMBER(1,0) NOT NULL, COUNTRY_CODE CHAR(2),
month NUMBER(2,0) NOT NULL, COUNTRY_NAME VARCHAR2(100)
month_name VARCHAR2(10) NOT NULL, )
week_number NUMBER(2,0) NOT NULL, /
day_of_month NUMBER(2,0) NOT NULL,
day_name VARCHAR2(10) NOT NULL,
is_weekend NUMBER(1,0) NOT NULL,
is_business_day NUMBER(1,0) NOT NULL,
season VARCHAR2(10) NOT NULL,
CONSTRAINT pk_dim_date PRIMARY KEY (date_key),
CONSTRAINT ck_dim_date_wknd CHECK (is_weekend IN (0,1)),
CONSTRAINT ck_dim_date_bday CHECK (is_business_day IN (0,1))
);
-- SCD Type 1 — country attributes are stable; just overwrite if anything changes create table STG_ROOM
CREATE TABLE DIM_COUNTRY ( (
country_key NUMBER(10,0) GENERATED ALWAYS AS IDENTITY, SOURCE_ROOM_ID NUMBER(10) not null,
country_id NUMBER(10,0) NOT NULL, HOTEL_CODE VARCHAR2(20) not null,
code CHAR(2) NOT NULL, ROOM_NUMBER VARCHAR2(10) not null,
name VARCHAR2(100) NOT NULL, FLOOR NUMBER(3) not null,
currency VARCHAR2(10) NOT NULL, ROOM_TYPE_CODE VARCHAR2(20) not null,
CONSTRAINT pk_dim_country PRIMARY KEY (country_key), ROOM_TYPE_DESCRIPTION VARCHAR2(100) not null,
CONSTRAINT uq_dim_cntry_id UNIQUE (country_id) SMOKING_YN NUMBER(1) not null,
); STANDARD_RATE NUMBER(10, 2) not null,
HOTEL_ID NUMBER(10)
)
/
-- SCD Type 1 — star rating lookup, never changes create table STG_ROOM_BOOKING
CREATE TABLE DIM_STAR_RATING ( (
star_rating_key NUMBER(10,0) GENERATED ALWAYS AS IDENTITY, SOURCE_RB_ID NUMBER(10) not null,
star_rating_id NUMBER(10,0) NOT NULL, GUEST_ID NUMBER(10) not null,
code NUMBER(1,0) NOT NULL, BOOKING_CREATED_DATE DATE not null,
description VARCHAR2(20) NOT NULL, CHECKIN_DATE DATE not null,
CONSTRAINT pk_dim_star PRIMARY KEY (star_rating_key), CHECKOUT_DATE DATE not null,
CONSTRAINT uq_dim_star_id UNIQUE (star_rating_id) BOOKING_STATUS VARCHAR2(20) not null,
); BOOKING_COUNT NUMBER(1) default 1 not null,
NIGHTS_STAYED NUMBER(4) not null,
NIGHTLY_RATE NUMBER(10, 2) not null,
TOTAL_AMOUNT NUMBER(12, 2) not null,
HOTEL_ID NUMBER(10) not null,
ROOM_ID NUMBER(10) not null
)
/
-- SCD Type 1 — chain name/code rarely changes; overwrite
CREATE TABLE DIM_HOTEL_CHAIN (
hotel_chain_key NUMBER(10,0) GENERATED ALWAYS AS IDENTITY,
hotel_chain_id NUMBER(10,0) NOT NULL,
code VARCHAR2(10) NOT NULL,
name VARCHAR2(100) NOT NULL,
CONSTRAINT pk_dim_chain PRIMARY KEY (hotel_chain_key),
CONSTRAINT uq_dim_chain_id UNIQUE (hotel_chain_id)
);
-- SCD Type 2 — hotels can change star rating or chain affiliation over time.
-- source_hotel_id is the natural key from MySQL; hotel_key is the surrogate.
-- One hotel can have multiple rows; IS_CURRENT=1 row is the active version.
-- FACT_ROOM_BOOKING links to the hotel version current at check-in date.
CREATE TABLE DIM_HOTEL (
hotel_key NUMBER(10,0) GENERATED ALWAYS AS IDENTITY,
source_hotel_id NUMBER(10,0) NOT NULL,
hotel_chain_key NUMBER(10,0),
country_key NUMBER(10,0) NOT NULL,
star_rating_key NUMBER(10,0) NOT NULL,
code VARCHAR2(20) NOT NULL,
name VARCHAR2(150) NOT NULL,
city VARCHAR2(100) NOT NULL,
-- SCD2 versioning
effective_date DATE NOT NULL,
expiry_date DATE,
is_current NUMBER(1,0) DEFAULT 1 NOT NULL,
CONSTRAINT pk_dim_hotel PRIMARY KEY (hotel_key),
CONSTRAINT ck_dh_current CHECK (is_current IN (0,1)),
CONSTRAINT fk_dh_chain FOREIGN KEY (hotel_chain_key) REFERENCES DIM_HOTEL_CHAIN (hotel_chain_key),
CONSTRAINT fk_dh_country FOREIGN KEY (country_key) REFERENCES DIM_COUNTRY (country_key),
CONSTRAINT fk_dh_star FOREIGN KEY (star_rating_key) REFERENCES DIM_STAR_RATING (star_rating_key)
);
-- SCD Type 1 — room type/floor rarely changes; upsert is sufficient
CREATE TABLE DIM_ROOM (
room_key NUMBER(10,0) GENERATED ALWAYS AS IDENTITY,
room_id NUMBER(10,0) NOT NULL,
hotel_key NUMBER(10,0) NOT NULL,
room_number VARCHAR2(10) NOT NULL,
floor NUMBER(3,0) NOT NULL,
room_type_code VARCHAR2(20) NOT NULL,
room_type_desc VARCHAR2(100) NOT NULL,
smoking_yn NUMBER(1,0) NOT NULL,
standard_rate NUMBER(10,2) NOT NULL,
CONSTRAINT pk_dim_room PRIMARY KEY (room_key),
CONSTRAINT uq_dim_room_id UNIQUE (room_id),
CONSTRAINT fk_dr_hotel FOREIGN KEY (hotel_key) REFERENCES DIM_HOTEL (hotel_key),
CONSTRAINT ck_dim_room_smk CHECK (smoking_yn IN (0,1))
);
-- SCD Type 1 — guest contact details are overwritten if they change
CREATE TABLE DIM_GUEST (
guest_key NUMBER(10,0) GENERATED ALWAYS AS IDENTITY,
guest_id NUMBER(10,0) NOT NULL,
country_key NUMBER(10,0),
name VARCHAR2(150) NOT NULL,
city VARCHAR2(100),
CONSTRAINT pk_dim_guest PRIMARY KEY (guest_key),
CONSTRAINT uq_dim_guest_id UNIQUE (guest_id),
CONSTRAINT fk_dg_country FOREIGN KEY (country_key) REFERENCES DIM_COUNTRY (country_key)
);
-- -----------------------------------------------------------------------------
-- FACT TABLE
-- -----------------------------------------------------------------------------
-- Grain: one row per room_booking.
-- source_rb_id: natural key from MySQL — used for idempotent incremental loads.
-- hotel_key: points to the DIM_HOTEL version active at check-in (SCD2 lookup).
CREATE TABLE FACT_ROOM_BOOKING (
fact_id NUMBER(10,0) GENERATED ALWAYS AS IDENTITY,
source_rb_id NUMBER(10,0) NOT NULL,
-- dimension FKs
hotel_key NUMBER(10,0) NOT NULL,
hotel_chain_key NUMBER(10,0),
room_key NUMBER(10,0) NOT NULL,
guest_key NUMBER(10,0) NOT NULL,
country_key NUMBER(10,0),
star_rating_key NUMBER(10,0) NOT NULL,
checkin_date_key NUMBER(8,0) NOT NULL,
checkout_date_key NUMBER(8,0) NOT NULL,
-- degenerate dimension
booking_status VARCHAR2(20) NOT NULL,
-- measures
nights_stayed NUMBER(4,0) NOT NULL,
nightly_rate NUMBER(10,2) NOT NULL,
total_amount NUMBER(12,2) NOT NULL,
CONSTRAINT pk_fact_rb PRIMARY KEY (fact_id),
CONSTRAINT uq_fact_rb_src UNIQUE (source_rb_id),
CONSTRAINT fk_frb_hotel FOREIGN KEY (hotel_key) REFERENCES DIM_HOTEL (hotel_key),
CONSTRAINT fk_frb_chain FOREIGN KEY (hotel_chain_key) REFERENCES DIM_HOTEL_CHAIN (hotel_chain_key),
CONSTRAINT fk_frb_room FOREIGN KEY (room_key) REFERENCES DIM_ROOM (room_key),
CONSTRAINT fk_frb_guest FOREIGN KEY (guest_key) REFERENCES DIM_GUEST (guest_key),
CONSTRAINT fk_frb_country FOREIGN KEY (country_key) REFERENCES DIM_COUNTRY (country_key),
CONSTRAINT fk_frb_star FOREIGN KEY (star_rating_key) REFERENCES DIM_STAR_RATING (star_rating_key),
CONSTRAINT fk_frb_checkin FOREIGN KEY (checkin_date_key) REFERENCES DIM_DATE (date_key),
CONSTRAINT fk_frb_checkout FOREIGN KEY (checkout_date_key) REFERENCES DIM_DATE (date_key)
);