Compare commits
5 Commits
e64288694b
...
master
| Author | SHA1 | Date | |
|---|---|---|---|
| 946e4020d9 | |||
| 571c749e25 | |||
| 718407d709 | |||
| 348a074a3a | |||
| 6c70628616 |
98
docs/01-overview.md
Normal file
98
docs/01-overview.md
Normal file
@@ -0,0 +1,98 @@
|
|||||||
|
# Hotel Reservations — Data Warehouse Project
|
||||||
|
|
||||||
|
## Project Summary
|
||||||
|
|
||||||
|
This project implements a complete **Data Warehousing pipeline** for a hotel reservation system, covering all standard DW layers:
|
||||||
|
|
||||||
|
```
|
||||||
|
MySQL OLTP ──► Apache NiFi ETL ──► Oracle Data Mart ──► Power BI Reports
|
||||||
|
(source) (transform) (analytical store) (OLAP queries)
|
||||||
|
```
|
||||||
|
|
||||||
|
The system is built around the **A.24 Hotel Reservations** domain from the course specification. The OLTP database was populated with **~635,000 synthetically generated rows** covering 200 hotels, 100,000 guests, 500,000 bookings, and 531,000 room bookings across a 4-year period (2022–2025).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Business Context
|
||||||
|
|
||||||
|
A hotel chain needs to answer questions like:
|
||||||
|
|
||||||
|
- Which countries generate the most revenue per quarter?
|
||||||
|
- How does occupancy differ between peak and off-peak seasons?
|
||||||
|
- What is the revenue contribution of 5-star vs 3-star hotels?
|
||||||
|
- How has a hotel's revenue changed after upgrading its star rating?
|
||||||
|
|
||||||
|
These questions require **historical, multi-dimensional analysis** that a normalized OLTP database cannot serve efficiently. The data mart provides pre-modelled, denormalized data optimized for analytical queries.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────┐
|
||||||
|
│ SOURCE LAYER │
|
||||||
|
│ MySQL 8.4 (Docker/Podman, port 13306) │
|
||||||
|
│ Database: hotel_reservations │
|
||||||
|
│ 13 normalized tables, ~635K rows │
|
||||||
|
└───────────────────────┬─────────────────────────────────┘
|
||||||
|
│ JDBC (MySqlConnector)
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────┐
|
||||||
|
│ ETL LAYER │
|
||||||
|
│ Apache NiFi │
|
||||||
|
│ 5 Process Groups: Date Dim / Static Dims / │
|
||||||
|
│ SCD2 Hotel / SCD1 Guest / Incremental Fact │
|
||||||
|
└───────────────────────┬─────────────────────────────────┘
|
||||||
|
│ JDBC (Oracle JDBC)
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────┐
|
||||||
|
│ DATA MART LAYER │
|
||||||
|
│ Oracle (university lab schema) │
|
||||||
|
│ Star schema: 6 dimensions + 1 fact table │
|
||||||
|
│ SCD Type 2 on DIM_HOTEL │
|
||||||
|
└───────────────────────┬─────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────┐
|
||||||
|
│ PRESENTATION LAYER │
|
||||||
|
│ Power BI Desktop │
|
||||||
|
│ OLAP reports via DirectQuery / Import │
|
||||||
|
└─────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Technology Stack
|
||||||
|
|
||||||
|
| Component | Technology | Version |
|
||||||
|
|-----------|-----------|---------|
|
||||||
|
| OLTP Database | MySQL | 8.4 |
|
||||||
|
| Container runtime | Docker / Podman | — |
|
||||||
|
| Data generator | C# (.NET) | 10 |
|
||||||
|
| ETL tool | Apache NiFi | — |
|
||||||
|
| Data Mart | Oracle RDBMS | university lab |
|
||||||
|
| Reporting | Power BI Desktop | — |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Repository Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
IPZ_1/
|
||||||
|
├── docker/
|
||||||
|
│ ├── start.sh # Start MySQL container (Linux/macOS)
|
||||||
|
│ ├── stop.sh # Stop MySQL container
|
||||||
|
│ ├── start.ps1 # Start MySQL container (Windows)
|
||||||
|
│ └── stop.ps1 # Stop MySQL container
|
||||||
|
├── sql/
|
||||||
|
│ ├── schema.sql # MySQL OLTP DDL
|
||||||
|
│ └── datamart_schema.sql # Oracle Data Mart DDL
|
||||||
|
├── generator/
|
||||||
|
│ └── generate.cs # .NET 10 data generator script
|
||||||
|
└── docs/
|
||||||
|
├── 01-overview.md # This file
|
||||||
|
├── 02-oltp.md # OLTP database design
|
||||||
|
├── 03-datamart.md # Data mart design
|
||||||
|
├── 04-setup.md # Setup and run guide
|
||||||
|
└── nifi-flow.md # NiFi ETL flow reference
|
||||||
|
```
|
||||||
258
docs/02-oltp.md
Normal file
258
docs/02-oltp.md
Normal file
@@ -0,0 +1,258 @@
|
|||||||
|
# OLTP Database — Design & Details
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The OLTP (Online Transaction Processing) database models a **hotel reservation system** using a fully normalized relational schema in **MySQL 8.4**. It follows 3NF and enforces referential integrity via foreign keys.
|
||||||
|
|
||||||
|
- **Database:** `hotel_reservations`
|
||||||
|
- **Character set:** `utf8mb4` / `utf8mb4_unicode_ci`
|
||||||
|
- **Tables:** 13
|
||||||
|
- **Total rows:** ~635,000
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Entity-Relationship Model
|
||||||
|
|
||||||
|
The schema covers five entity groups:
|
||||||
|
|
||||||
|
```
|
||||||
|
hotel_chain ──┐
|
||||||
|
country ───────┼──► hotel ──► hotel_room ──► room_booking ──► booking ──► guest
|
||||||
|
star_rating ──┘ │
|
||||||
|
└──► country
|
||||||
|
hotel_characteristic ◄──► hotel (M:N via hotel_hotel_characteristic)
|
||||||
|
|
||||||
|
room_type ◄──── hotel_room
|
||||||
|
room_type ◄──┐
|
||||||
|
rate_period ◄─┴── period_room_rate (price per room type per season)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Table Descriptions
|
||||||
|
|
||||||
|
### Reference / Lookup Tables
|
||||||
|
|
||||||
|
#### `hotel_chain`
|
||||||
|
International hotel chains (Hilton, Marriott, Accor, etc.).
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `hotel_chain_id` | INT UNSIGNED PK | Surrogate key |
|
||||||
|
| `code` | VARCHAR(10) UNIQUE | Short code (e.g. `HLT`) |
|
||||||
|
| `name` | VARCHAR(100) | Full name |
|
||||||
|
|
||||||
|
**Rows:** 10
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### `country`
|
||||||
|
Countries from which guests come and where hotels are located.
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `country_id` | INT UNSIGNED PK | Surrogate key |
|
||||||
|
| `code` | CHAR(2) UNIQUE | ISO 3166-1 alpha-2 (e.g. `GB`) |
|
||||||
|
| `name` | VARCHAR(100) | Country name |
|
||||||
|
| `currency` | VARCHAR(10) | ISO currency code (e.g. `EUR`) |
|
||||||
|
|
||||||
|
**Rows:** 40 (Europe, Americas, Asia, Africa, Oceania)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### `star_rating`
|
||||||
|
Hotel classification from 1★ to 5★.
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `star_rating_id` | INT UNSIGNED PK | Surrogate key |
|
||||||
|
| `code` | TINYINT UNIQUE | 1–5 |
|
||||||
|
| `description` | VARCHAR(20) | e.g. `3 Star` |
|
||||||
|
|
||||||
|
**Rows:** 5
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### `hotel_characteristic`
|
||||||
|
Amenities and features a hotel may offer.
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `characteristic_id` | INT UNSIGNED PK | Surrogate key |
|
||||||
|
| `code` | VARCHAR(20) UNIQUE | e.g. `POOL`, `SPA`, `WIFI` |
|
||||||
|
| `description` | VARCHAR(100) | Human-readable label |
|
||||||
|
|
||||||
|
**Rows:** 12 (WiFi, Pool, Gym, Spa, Restaurant, Bar, Parking, Valet, Conference, Shuttle, Room Service, Pet Friendly)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### `room_type`
|
||||||
|
Types of rooms a hotel can offer, with a standard (base) rate.
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `room_type_id` | INT UNSIGNED PK | Surrogate key |
|
||||||
|
| `code` | VARCHAR(20) UNIQUE | e.g. `SINGLE`, `SUITE` |
|
||||||
|
| `description` | VARCHAR(100) | e.g. `Junior Suite` |
|
||||||
|
| `standard_rate` | DECIMAL(10,2) | Base nightly rate (EUR) |
|
||||||
|
| `smoking_yn` | BOOLEAN | Smoking allowed flag |
|
||||||
|
|
||||||
|
**Rows:** 7 (Single €80, Double €120, Twin €115, Deluxe €180, Suite €280, Executive €450, Family €200)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### `rate_period`
|
||||||
|
Seasonal pricing periods. Each period maps to a month range and applies a rate multiplier.
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `rate_period_id` | INT UNSIGNED PK | Surrogate key |
|
||||||
|
| `code` | VARCHAR(20) UNIQUE | e.g. `PEAK`, `WINTER` |
|
||||||
|
| `description` | VARCHAR(50) | Human-readable label |
|
||||||
|
| `month_from` | TINYINT | Start month (1–12) |
|
||||||
|
| `month_to` | TINYINT | End month (1–12) |
|
||||||
|
|
||||||
|
**Rows:** 4
|
||||||
|
|
||||||
|
| Code | Period | Months | Multiplier |
|
||||||
|
|------|--------|--------|-----------|
|
||||||
|
| PEAK | Peak Season | Jun–Aug | ×1.5 |
|
||||||
|
| HIGH | High Season | Mar–May | ×1.2 |
|
||||||
|
| AUTUMN | Autumn Season | Sep–Nov | ×1.1 |
|
||||||
|
| WINTER | Winter Season | Dec–Feb | ×0.9 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Junction Tables
|
||||||
|
|
||||||
|
#### `period_room_rate`
|
||||||
|
The effective nightly rate for each (room_type, rate_period) combination.
|
||||||
|
Rate = `standard_rate × season_multiplier`.
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `room_type_id` | INT UNSIGNED PK/FK | |
|
||||||
|
| `rate_period_id` | INT UNSIGNED PK/FK | |
|
||||||
|
| `rate` | DECIMAL(10,2) | Effective nightly rate |
|
||||||
|
|
||||||
|
**Rows:** 28 (7 room types × 4 seasons)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### `hotel_hotel_characteristic`
|
||||||
|
M:N junction between hotels and their amenities.
|
||||||
|
|
||||||
|
| Column | Type |
|
||||||
|
|--------|------|
|
||||||
|
| `hotel_id` | INT UNSIGNED PK/FK |
|
||||||
|
| `characteristic_id` | INT UNSIGNED PK/FK |
|
||||||
|
|
||||||
|
**Rows:** ~1,415
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Core Entity Tables
|
||||||
|
|
||||||
|
#### `hotel`
|
||||||
|
Individual hotel properties.
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `hotel_id` | INT UNSIGNED PK | |
|
||||||
|
| `hotel_chain_id` | INT UNSIGNED FK | NULL for independent hotels |
|
||||||
|
| `country_id` | INT UNSIGNED FK | |
|
||||||
|
| `star_rating_id` | INT UNSIGNED FK | |
|
||||||
|
| `code` | VARCHAR(20) UNIQUE | e.g. `HTL0001` |
|
||||||
|
| `name` | VARCHAR(150) | |
|
||||||
|
| `address` | VARCHAR(200) | |
|
||||||
|
| `postcode` | VARCHAR(20) | |
|
||||||
|
| `city` | VARCHAR(100) | |
|
||||||
|
| `url` | VARCHAR(200) | |
|
||||||
|
|
||||||
|
**Rows:** 200 (50 cities, star distribution: 5% 1★, 10% 2★, 40% 3★, 30% 4★, 15% 5★)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### `hotel_room`
|
||||||
|
Individual rooms within each hotel.
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `room_id` | INT UNSIGNED PK | |
|
||||||
|
| `hotel_id` | INT UNSIGNED FK | |
|
||||||
|
| `room_type_id` | INT UNSIGNED FK | |
|
||||||
|
| `room_number` | VARCHAR(10) | Format: `{floor}{number}`, e.g. `101` |
|
||||||
|
| `floor` | TINYINT UNSIGNED | |
|
||||||
|
|
||||||
|
**Rows:** 5,334 (5–60 rooms per hotel depending on star rating)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### `guest`
|
||||||
|
Hotel guests.
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `guest_id` | INT UNSIGNED PK | |
|
||||||
|
| `country_id` | INT UNSIGNED FK | Guest's home country |
|
||||||
|
| `name` | VARCHAR(150) | Full name |
|
||||||
|
| `email` | VARCHAR(150) | Unique synthetic email |
|
||||||
|
| `address` | VARCHAR(200) | |
|
||||||
|
| `city` | VARCHAR(100) | |
|
||||||
|
|
||||||
|
**Rows:** 100,000
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### `booking`
|
||||||
|
A reservation made by a guest at a hotel. One booking can cover multiple rooms.
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `booking_id` | INT UNSIGNED PK | |
|
||||||
|
| `guest_id` | INT UNSIGNED FK | |
|
||||||
|
| `hotel_id` | INT UNSIGNED FK | |
|
||||||
|
| `date_from` | DATE | Check-in |
|
||||||
|
| `date_to` | DATE | Check-out |
|
||||||
|
| `status` | ENUM | `confirmed`, `cancelled`, `completed`, `no_show` |
|
||||||
|
| `created_at` | DATETIME | When booking was made |
|
||||||
|
|
||||||
|
**Rows:** 500,000
|
||||||
|
**Status distribution:** 80% completed, 10% confirmed, 7% cancelled, 3% no_show
|
||||||
|
**Date range:** 2022-01-01 – 2025-12-31
|
||||||
|
**Seasonal distribution:** June–August heaviest (peak), December–February lightest
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### `room_booking`
|
||||||
|
A specific room assigned within a booking. Stores the rate **as it was at booking time** (snapshot), independent of any future rate changes.
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `room_booking_id` | INT UNSIGNED PK | |
|
||||||
|
| `booking_id` | INT UNSIGNED FK | |
|
||||||
|
| `room_id` | INT UNSIGNED FK | |
|
||||||
|
| `date_from` | DATE | |
|
||||||
|
| `date_to` | DATE | |
|
||||||
|
| `nightly_rate` | DECIMAL(10,2) | Rate at time of booking |
|
||||||
|
| `total_amount` | DECIMAL(10,2) | `nightly_rate × nights` |
|
||||||
|
|
||||||
|
**Rows:** 531,382
|
||||||
|
**Room count per booking:** 90% single room, 8% two rooms, 2% three rooms
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Data Generation
|
||||||
|
|
||||||
|
The database was populated using a **single-file C# script** (`generator/generate.cs`) running on .NET 10, using `MySqlConnector` as the only dependency.
|
||||||
|
|
||||||
|
Key generation decisions:
|
||||||
|
- **Seasonal booking distribution** via rejection sampling — months Jun–Aug are ~2.7× more likely than Jan–Feb
|
||||||
|
- **Rate snapshot** — each `room_booking.nightly_rate` is looked up from `period_room_rate` at insert time and stored, not re-computed later
|
||||||
|
- **Realistic stay lengths** — 30% one night, 25% two nights, 20% three nights, tapering off to 14-night stays
|
||||||
|
- **Cancelled/no-show bookings** partially skip room assignment (60% of cancellations have no room_booking)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run generator
|
||||||
|
dotnet run generator/generate.cs
|
||||||
|
```
|
||||||
255
docs/03-datamart.md
Normal file
255
docs/03-datamart.md
Normal file
@@ -0,0 +1,255 @@
|
|||||||
|
# Data Mart — Design & Details
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The data mart uses a **star schema** stored in an Oracle database (university lab schema). It is optimized for analytical queries against hotel reservation data — revenue analysis, occupancy trends, seasonal patterns, and guest origin breakdowns.
|
||||||
|
|
||||||
|
- **Schema type:** Star schema
|
||||||
|
- **Dimensions:** 6 (+ date dimension)
|
||||||
|
- **Fact table:** `FACT_ROOM_BOOKING`
|
||||||
|
- **Grain:** One row per room_booking (one room, one stay)
|
||||||
|
- **SCD strategy:** Type 2 on DIM_HOTEL, Type 1 on all others
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Star Schema Diagram
|
||||||
|
|
||||||
|
```
|
||||||
|
DIM_DATE
|
||||||
|
(date_key)
|
||||||
|
│
|
||||||
|
┌───────────┴───────────┐
|
||||||
|
│ checkin / checkout │
|
||||||
|
│ │
|
||||||
|
DIM_HOTEL_CHAIN ◄─ DIM_HOTEL ─► DIM_STAR_RATING
|
||||||
|
│ │
|
||||||
|
│ FACT_ROOM_BOOKING ◄──── DIM_ROOM
|
||||||
|
│ │
|
||||||
|
└───────► DIM_COUNTRY ◄───── DIM_GUEST
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Dimension Tables
|
||||||
|
|
||||||
|
### DIM_DATE
|
||||||
|
Populated once for the range 2020–2030. Used for both check-in and check-out date lookups.
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `date_key` | NUMBER(8) PK | YYYYMMDD integer key |
|
||||||
|
| `full_date` | DATE | Actual date value |
|
||||||
|
| `year` | NUMBER(4) | |
|
||||||
|
| `quarter` | NUMBER(1) | 1–4 |
|
||||||
|
| `month` | NUMBER(2) | 1–12 |
|
||||||
|
| `month_name` | VARCHAR2(10) | e.g. `January` |
|
||||||
|
| `week_number` | NUMBER(2) | ISO week number |
|
||||||
|
| `day_of_month` | NUMBER(2) | |
|
||||||
|
| `day_name` | VARCHAR2(10) | e.g. `Monday` |
|
||||||
|
| `is_weekend` | NUMBER(1) | 0/1 |
|
||||||
|
| `is_business_day` | NUMBER(1) | 0/1 |
|
||||||
|
| `season` | VARCHAR2(10) | Peak / High / Autumn / Winter |
|
||||||
|
|
||||||
|
Using an integer date key (YYYYMMDD) instead of a DATE FK allows efficient range predicates: `checkin_date_key BETWEEN 20240601 AND 20240831`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### DIM_COUNTRY (SCD Type 1)
|
||||||
|
Country attributes are stable. If a name or currency ever changes, the row is simply overwritten (no history needed).
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `country_key` | NUMBER(10) PK | Surrogate (IDENTITY) |
|
||||||
|
| `country_id` | NUMBER(10) UNIQUE | Natural key from MySQL |
|
||||||
|
| `code` | CHAR(2) | ISO alpha-2 |
|
||||||
|
| `name` | VARCHAR2(100) | |
|
||||||
|
| `currency` | VARCHAR2(10) | ISO currency code |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### DIM_STAR_RATING (SCD Type 1)
|
||||||
|
Static lookup. Star rating codes 1–5 never change.
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `star_rating_key` | NUMBER(10) PK | Surrogate (IDENTITY) |
|
||||||
|
| `star_rating_id` | NUMBER(10) UNIQUE | Natural key |
|
||||||
|
| `code` | NUMBER(1) | 1–5 |
|
||||||
|
| `description` | VARCHAR2(20) | e.g. `4 Star` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### DIM_HOTEL_CHAIN (SCD Type 1)
|
||||||
|
Chain name/code may be updated (e.g. corporate rebranding), but we do not need a historical record of chain name changes.
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `hotel_chain_key` | NUMBER(10) PK | Surrogate (IDENTITY) |
|
||||||
|
| `hotel_chain_id` | NUMBER(10) UNIQUE | Natural key |
|
||||||
|
| `code` | VARCHAR2(10) | e.g. `HLT` |
|
||||||
|
| `name` | VARCHAR2(100) | |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### DIM_HOTEL (SCD Type 2)
|
||||||
|
|
||||||
|
This is the most analytically significant dimension and the only one implemented as **Slowly Changing Dimension Type 2**.
|
||||||
|
|
||||||
|
**Why SCD Type 2 here?**
|
||||||
|
|
||||||
|
A hotel's star rating or chain affiliation can change over time — a property gets renovated and reclassified from 3★ to 4★, or switches from one international chain to another. These changes directly affect revenue analysis: a 3★ hotel charges different rates than a 4★ hotel, and grouping all historical bookings under the current star rating would produce misleading averages.
|
||||||
|
|
||||||
|
SCD Type 2 preserves history by creating a **new row** for each version of a hotel, while expiring the old row with an `expiry_date`. The fact table's `hotel_key` always points to the version that was active **at check-in date**, never to the current version if it changed.
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `hotel_key` | NUMBER(10) PK | Surrogate (IDENTITY) |
|
||||||
|
| `source_hotel_id` | NUMBER(10) | Natural key from MySQL |
|
||||||
|
| `hotel_chain_key` | NUMBER(10) FK | NULL for independent hotels |
|
||||||
|
| `country_key` | NUMBER(10) FK | |
|
||||||
|
| `star_rating_key` | NUMBER(10) FK | |
|
||||||
|
| `code` | VARCHAR2(20) | |
|
||||||
|
| `name` | VARCHAR2(150) | |
|
||||||
|
| `city` | VARCHAR2(100) | |
|
||||||
|
| `effective_date` | DATE | When this version became active |
|
||||||
|
| `expiry_date` | DATE | When this version was superseded (NULL = current) |
|
||||||
|
| `is_current` | NUMBER(1) | 1 = current version |
|
||||||
|
|
||||||
|
**SCD2 example:**
|
||||||
|
|
||||||
|
| hotel_key | source_hotel_id | star_rating | effective_date | expiry_date | is_current |
|
||||||
|
|-----------|----------------|-------------|----------------|-------------|-----------|
|
||||||
|
| 1 | 42 | 3★ | 2022-01-01 | 2024-05-31 | 0 |
|
||||||
|
| 2 | 42 | 4★ | 2024-06-01 | NULL | 1 |
|
||||||
|
|
||||||
|
Bookings from 2022–2024 point to `hotel_key=1`, bookings from 2024 onward point to `hotel_key=2`. Revenue by star category remains historically correct.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### DIM_ROOM (SCD Type 1)
|
||||||
|
Room type is stable for our dataset. Updated via MERGE if room details ever change.
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `room_key` | NUMBER(10) PK | Surrogate (IDENTITY) |
|
||||||
|
| `room_id` | NUMBER(10) UNIQUE | Natural key |
|
||||||
|
| `hotel_key` | NUMBER(10) FK | Points to current DIM_HOTEL version |
|
||||||
|
| `room_number` | VARCHAR2(10) | |
|
||||||
|
| `floor` | NUMBER(3) | |
|
||||||
|
| `room_type_code` | VARCHAR2(20) | e.g. `SUITE` |
|
||||||
|
| `room_type_desc` | VARCHAR2(100) | |
|
||||||
|
| `smoking_yn` | NUMBER(1) | |
|
||||||
|
| `standard_rate` | NUMBER(10,2) | Base rate from OLTP |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### DIM_GUEST (SCD Type 1)
|
||||||
|
Guest personal data (city, country) may change, but tracking historical addresses has no analytical value for this domain. MERGE (upsert) is used.
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `guest_key` | NUMBER(10) PK | Surrogate (IDENTITY) |
|
||||||
|
| `guest_id` | NUMBER(10) UNIQUE | Natural key |
|
||||||
|
| `country_key` | NUMBER(10) FK | Home country |
|
||||||
|
| `name` | VARCHAR2(150) | |
|
||||||
|
| `city` | VARCHAR2(100) | |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Fact Table: FACT_ROOM_BOOKING
|
||||||
|
|
||||||
|
**Grain:** One row per room_booking — one specific room, for one stay.
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `fact_id` | NUMBER(10) PK | Surrogate (IDENTITY) |
|
||||||
|
| `source_rb_id` | NUMBER(10) UNIQUE | Natural key — used for idempotent incremental loads |
|
||||||
|
| `hotel_key` | NUMBER(10) FK | SCD2-resolved hotel version at check-in |
|
||||||
|
| `hotel_chain_key` | NUMBER(10) FK | Denormalized from DIM_HOTEL for convenience |
|
||||||
|
| `room_key` | NUMBER(10) FK | |
|
||||||
|
| `guest_key` | NUMBER(10) FK | |
|
||||||
|
| `country_key` | NUMBER(10) FK | Guest's country — denormalized |
|
||||||
|
| `star_rating_key` | NUMBER(10) FK | Denormalized from DIM_HOTEL for convenience |
|
||||||
|
| `checkin_date_key` | NUMBER(8) FK | YYYYMMDD |
|
||||||
|
| `checkout_date_key` | NUMBER(8) FK | YYYYMMDD |
|
||||||
|
| `booking_status` | VARCHAR2(20) | Degenerate dimension: confirmed/completed/cancelled/no_show |
|
||||||
|
| `nights_stayed` | NUMBER(4) | checkout − checkin in days |
|
||||||
|
| `nightly_rate` | NUMBER(10,2) | Rate per night at time of booking |
|
||||||
|
| `total_amount` | NUMBER(12,2) | `nightly_rate × nights_stayed` |
|
||||||
|
|
||||||
|
### Measures
|
||||||
|
|
||||||
|
| Measure | Type | Aggregation |
|
||||||
|
|---------|------|-------------|
|
||||||
|
| `nights_stayed` | Additive | SUM, AVG |
|
||||||
|
| `nightly_rate` | Semi-additive | AVG (not SUM — rate doesn't add across rooms meaningfully) |
|
||||||
|
| `total_amount` | Additive | SUM (main revenue measure) |
|
||||||
|
|
||||||
|
### Degenerate Dimensions
|
||||||
|
`booking_status` is stored directly on the fact row. Splitting it into a separate dimension table would add a table with only 4 rows and no other attributes — not worth the JOIN overhead.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ETL Control Tables
|
||||||
|
|
||||||
|
### ETL_WATERMARK
|
||||||
|
Tracks the highest `room_booking_id` already loaded into the fact table, enabling incremental loads without re-reading the entire source.
|
||||||
|
|
||||||
|
| Column | Description |
|
||||||
|
|--------|-------------|
|
||||||
|
| `entity_name` | Logical entity name (e.g. `FACT_ROOM_BOOKING`) |
|
||||||
|
| `last_key` | Highest PK value loaded so far |
|
||||||
|
| `last_run_ts` | Timestamp of the last ETL run |
|
||||||
|
|
||||||
|
### STG_HOTEL
|
||||||
|
Staging table used by the SCD2 ETL process. NiFi loads raw hotel data from MySQL here, then SQL applies the expire-and-insert SCD2 logic in a single transaction. Truncated at the start of each ETL run.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Sample Analytical Queries
|
||||||
|
|
||||||
|
### Revenue by country and quarter
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
c.name AS country,
|
||||||
|
d.year,
|
||||||
|
d.quarter,
|
||||||
|
SUM(f.total_amount) AS revenue,
|
||||||
|
COUNT(*) AS room_nights
|
||||||
|
FROM FACT_ROOM_BOOKING f
|
||||||
|
JOIN DIM_DATE d ON d.date_key = f.checkin_date_key
|
||||||
|
JOIN DIM_GUEST g ON g.guest_key = f.guest_key
|
||||||
|
JOIN DIM_COUNTRY c ON c.country_key = g.country_key
|
||||||
|
WHERE f.booking_status = 'completed'
|
||||||
|
GROUP BY c.name, d.year, d.quarter
|
||||||
|
ORDER BY revenue DESC;
|
||||||
|
```
|
||||||
|
|
||||||
|
### Average revenue per star category (correct because of SCD2)
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
sr.code AS stars,
|
||||||
|
d.season,
|
||||||
|
AVG(f.nightly_rate) AS avg_nightly_rate,
|
||||||
|
SUM(f.total_amount) AS total_revenue
|
||||||
|
FROM FACT_ROOM_BOOKING f
|
||||||
|
JOIN DIM_HOTEL h ON h.hotel_key = f.hotel_key
|
||||||
|
JOIN DIM_STAR_RATING sr ON sr.star_rating_key = f.star_rating_key
|
||||||
|
JOIN DIM_DATE d ON d.date_key = f.checkin_date_key
|
||||||
|
GROUP BY sr.code, d.season
|
||||||
|
ORDER BY sr.code, d.season;
|
||||||
|
```
|
||||||
|
|
||||||
|
### Top 10 cities by occupancy (room-nights)
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
h.city,
|
||||||
|
SUM(f.nights_stayed) AS room_nights,
|
||||||
|
SUM(f.total_amount) AS revenue
|
||||||
|
FROM FACT_ROOM_BOOKING f
|
||||||
|
JOIN DIM_HOTEL h ON h.hotel_key = f.hotel_key
|
||||||
|
WHERE f.booking_status IN ('completed','confirmed')
|
||||||
|
GROUP BY h.city
|
||||||
|
ORDER BY room_nights DESC
|
||||||
|
FETCH FIRST 10 ROWS ONLY;
|
||||||
|
```
|
||||||
181
docs/04-setup.md
Normal file
181
docs/04-setup.md
Normal file
@@ -0,0 +1,181 @@
|
|||||||
|
# Setup Guide
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
| Tool | Required for | Notes |
|
||||||
|
|------|-------------|-------|
|
||||||
|
| Docker or Podman | MySQL container | Use `--podman` flag on Linux |
|
||||||
|
| .NET 10 SDK | Data generator | `dotnet run file.cs` support |
|
||||||
|
| Apache NiFi | ETL | Running instance with Oracle + MySQL JDBC drivers |
|
||||||
|
| Oracle JDBC driver | NiFi | `ojdbc11.jar` in NiFi's lib directory |
|
||||||
|
| MySQL JDBC driver | NiFi | `mysql-connector-j-*.jar` in NiFi's lib directory |
|
||||||
|
| Oracle DB access | Data mart target | University lab credentials |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 1 — Start MySQL Container
|
||||||
|
|
||||||
|
**Linux / macOS (Docker):**
|
||||||
|
```bash
|
||||||
|
bash docker/start.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
**Linux / macOS (Podman):**
|
||||||
|
```bash
|
||||||
|
bash docker/start.sh --podman
|
||||||
|
```
|
||||||
|
|
||||||
|
**Windows (PowerShell):**
|
||||||
|
```powershell
|
||||||
|
.\docker\start.ps1
|
||||||
|
```
|
||||||
|
|
||||||
|
The script:
|
||||||
|
- Creates a named container `hotel-mysql` with a persistent data volume
|
||||||
|
- Mounts `sql/schema.sql` as an init script — all 13 tables are created automatically on first start
|
||||||
|
- Waits until MySQL is ready before exiting
|
||||||
|
|
||||||
|
**Connection details:**
|
||||||
|
```
|
||||||
|
Host: 127.0.0.1
|
||||||
|
Port: 13306
|
||||||
|
Database: hotel_reservations
|
||||||
|
User: root
|
||||||
|
Password: hotel2025root
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 2 — Generate OLTP Data
|
||||||
|
|
||||||
|
```bash
|
||||||
|
dotnet run generator/generate.cs
|
||||||
|
```
|
||||||
|
|
||||||
|
**Runtime:** ~3 minutes
|
||||||
|
**Output:** 635,000+ rows across 13 tables
|
||||||
|
|
||||||
|
The generator is deterministic (fixed seed `42`) — running it twice on an empty database produces the same data.
|
||||||
|
|
||||||
|
> **Important:** Run the generator only once on an empty database. If you need to restart, truncate all tables first (respecting FK order) or drop and recreate the container + volume.
|
||||||
|
|
||||||
|
### Quick table verification after generation:
|
||||||
|
```bash
|
||||||
|
# Docker
|
||||||
|
docker exec hotel-mysql mysql -uroot -photel2025root hotel_reservations \
|
||||||
|
-e "SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema='hotel_reservations';"
|
||||||
|
|
||||||
|
# Podman
|
||||||
|
podman exec hotel-mysql mysql -uroot -photel2025root hotel_reservations \
|
||||||
|
-e "SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema='hotel_reservations';"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 3 — Prepare Oracle Data Mart
|
||||||
|
|
||||||
|
Connect to the Oracle schema (university lab) and execute `sql/datamart_schema.sql`.
|
||||||
|
|
||||||
|
The script creates:
|
||||||
|
- `ETL_WATERMARK` (with initial row for `FACT_ROOM_BOOKING`)
|
||||||
|
- `STG_HOTEL` (staging)
|
||||||
|
- All 7 dimension tables
|
||||||
|
- `FACT_ROOM_BOOKING`
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Run in SQL*Plus or SQL Developer:
|
||||||
|
@datamart_schema.sql
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 4 — Configure NiFi
|
||||||
|
|
||||||
|
### 4.1 Add JDBC drivers to NiFi
|
||||||
|
|
||||||
|
Copy the following JARs to `$NIFI_HOME/lib/` (or the NiFi extensions directory):
|
||||||
|
- `mysql-connector-j-8.x.jar`
|
||||||
|
- `ojdbc11.jar`
|
||||||
|
|
||||||
|
Restart NiFi after adding drivers.
|
||||||
|
|
||||||
|
### 4.2 Create Controller Services
|
||||||
|
|
||||||
|
In NiFi UI → Controller Settings → Controller Services:
|
||||||
|
|
||||||
|
**MySQL connection:**
|
||||||
|
- Type: `DBCPConnectionPool`
|
||||||
|
- Database Driver Class Name: `com.mysql.cj.jdbc.Driver`
|
||||||
|
- Database Connection URL: `jdbc:mysql://127.0.0.1:13306/hotel_reservations`
|
||||||
|
- Database User: `root`
|
||||||
|
- Password: `hotel2025root`
|
||||||
|
|
||||||
|
**Oracle connection:**
|
||||||
|
- Type: `DBCPConnectionPool`
|
||||||
|
- Database Driver Class Name: `oracle.jdbc.OracleDriver`
|
||||||
|
- Database Connection URL: `jdbc:oracle:thin:@<host>:1521:<sid>`
|
||||||
|
- Database User: `<your_schema>`
|
||||||
|
- Password: `<your_password>`
|
||||||
|
|
||||||
|
Enable both services.
|
||||||
|
|
||||||
|
### 4.3 Build Process Groups
|
||||||
|
|
||||||
|
Follow the detailed processor configuration in `docs/nifi-flow.md`.
|
||||||
|
|
||||||
|
**Recommended build order:**
|
||||||
|
1. PG-1: Date Dimension (simplest, test first)
|
||||||
|
2. PG-2: Static Dimensions (verify MERGE logic)
|
||||||
|
3. PG-3: DIM_HOTEL SCD2 (most complex — check staging table after run)
|
||||||
|
4. PG-4: DIM_GUEST SCD1
|
||||||
|
5. PG-5: Fact Incremental Load
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 5 — Run ETL
|
||||||
|
|
||||||
|
### First full load
|
||||||
|
|
||||||
|
1. Run **PG-1** (Date Dimension) manually — run once
|
||||||
|
2. Start **PG-2, PG-3, PG-4** — these are idempotent, safe to re-run
|
||||||
|
3. Start **PG-5** — runs incrementally; first run loads all 531k room_bookings
|
||||||
|
|
||||||
|
### Verify load
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Oracle
|
||||||
|
SELECT COUNT(*) FROM DIM_HOTEL; -- should be 200 (+ more after SCD2 changes)
|
||||||
|
SELECT COUNT(*) FROM DIM_GUEST; -- 100,000
|
||||||
|
SELECT COUNT(*) FROM FACT_ROOM_BOOKING; -- 531,382
|
||||||
|
SELECT last_key FROM ETL_WATERMARK WHERE entity_name = 'FACT_ROOM_BOOKING'; -- 531,382
|
||||||
|
```
|
||||||
|
|
||||||
|
### Verify SCD2 is working
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Should show 1 current version per hotel on initial load
|
||||||
|
SELECT is_current, COUNT(*) FROM DIM_HOTEL GROUP BY is_current;
|
||||||
|
-- Expected: IS_CURRENT=1, COUNT=200
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Stop / Restart
|
||||||
|
|
||||||
|
**Stop MySQL (preserves data):**
|
||||||
|
```bash
|
||||||
|
bash docker/stop.sh [--podman]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Restart MySQL:**
|
||||||
|
```bash
|
||||||
|
bash docker/start.sh [--podman]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Full reset (delete all data):**
|
||||||
|
```bash
|
||||||
|
bash docker/stop.sh --podman
|
||||||
|
podman volume rm hotel-mysql-data
|
||||||
|
bash docker/start.sh --podman
|
||||||
|
dotnet run generator/generate.cs
|
||||||
|
```
|
||||||
83
docs/05-conclusion.md
Normal file
83
docs/05-conclusion.md
Normal file
@@ -0,0 +1,83 @@
|
|||||||
|
# Conclusion
|
||||||
|
|
||||||
|
## What Was Built
|
||||||
|
|
||||||
|
This project delivers a complete, working **Data Warehouse pipeline** for the Hotel Reservations domain:
|
||||||
|
|
||||||
|
| Layer | What was built | Scale |
|
||||||
|
|-------|---------------|-------|
|
||||||
|
| OLTP | MySQL 8.4, 13-table normalized schema | ~635,000 rows |
|
||||||
|
| Data generation | .NET 10 C# script, realistic seasonal distribution | 500K bookings in ~3 min |
|
||||||
|
| ETL | Apache NiFi, 5 process groups | full + incremental loads |
|
||||||
|
| Data Mart | Oracle star schema, SCD Type 2 on DIM_HOTEL | 1 fact + 6 dims |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Design Decisions
|
||||||
|
|
||||||
|
### Synthetic data generation instead of a Kaggle dataset
|
||||||
|
|
||||||
|
The decision to generate data rather than use a pre-existing dataset was deliberate. Publicly available hotel datasets are either too small (thousands of rows) or lack the normalized relational structure needed to demonstrate a realistic OLTP-to-DW pipeline. The generator produces statistically realistic data:
|
||||||
|
|
||||||
|
- Seasonal booking distribution (summer peak, winter trough)
|
||||||
|
- Realistic stay-length distribution (30% one-night stays)
|
||||||
|
- Varied status distribution (80% completed, 10% confirmed, 7% cancelled, 3% no-show)
|
||||||
|
- Revenue rates tied to actual seasonal pricing periods
|
||||||
|
|
||||||
|
### SCD Type 2 on DIM_HOTEL only
|
||||||
|
|
||||||
|
SCD Type 2 adds operational complexity — it requires staging tables, a two-phase SQL update, and SCD2-aware fact inserts. Applying it to every dimension would make the ETL unnecessarily complex for the analytical benefit gained.
|
||||||
|
|
||||||
|
DIM_HOTEL is the right candidate because:
|
||||||
|
- Star rating changes (3★→4★ after renovation) directly affect revenue benchmarks
|
||||||
|
- Chain affiliation changes (hotel joins or leaves a franchise) affect chain-level reporting
|
||||||
|
- Tracking these historically is the core value proposition of dimensional modelling
|
||||||
|
|
||||||
|
Guests, countries, room types, and hotel chains all change rarely or in ways that don't affect historical analysis — SCD Type 1 (overwrite) is appropriate.
|
||||||
|
|
||||||
|
### Watermark-based incremental fact loading
|
||||||
|
|
||||||
|
The fact table uses `source_rb_id` (the MySQL `room_booking_id`) as a natural key and applies a `NOT EXISTS` guard on every insert. Combined with the `ETL_WATERMARK` table, this makes PG-5 both **incremental** (only processes new rows) and **idempotent** (safe to re-run without creating duplicates). This pattern is production-standard and would scale cleanly to a real operational system.
|
||||||
|
|
||||||
|
### Integer date keys in DIM_DATE
|
||||||
|
|
||||||
|
`date_key` is stored as `NUMBER(8)` in YYYYMMDD format rather than a FK to a DATE column. This allows:
|
||||||
|
- Fast range predicates: `WHERE checkin_date_key BETWEEN 20240601 AND 20240831`
|
||||||
|
- No JOIN to get the date value when it's used directly in GROUP BY
|
||||||
|
- Human-readable values in query results without formatting
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Analytical Capabilities
|
||||||
|
|
||||||
|
The data mart enables the following categories of OLAP queries:
|
||||||
|
|
||||||
|
**Revenue analysis:**
|
||||||
|
- Total revenue by country, city, hotel chain, star category
|
||||||
|
- Revenue trend over time (monthly, quarterly, yearly)
|
||||||
|
- Revenue split by booking status and room type
|
||||||
|
|
||||||
|
**Occupancy analysis:**
|
||||||
|
- Room-nights sold per hotel, per season
|
||||||
|
- Average stay duration by guest country
|
||||||
|
- Cancellation rates by period and hotel category
|
||||||
|
|
||||||
|
**SCD2-specific analysis:**
|
||||||
|
- Compare revenue performance of hotels before and after star rating upgrade
|
||||||
|
- Identify which hotel version (chain affiliation) was more profitable
|
||||||
|
|
||||||
|
**Guest origin analysis:**
|
||||||
|
- Which countries generate the most bookings and revenue
|
||||||
|
- Cross-country booking patterns (guest country vs hotel country)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Limitations and Possible Extensions
|
||||||
|
|
||||||
|
| Limitation | Possible extension |
|
||||||
|
|------------|-------------------|
|
||||||
|
| Static OLTP data (no live updates) | Add a NiFi timer to simulate ongoing bookings |
|
||||||
|
| No SCD2 on DIM_ROOM | Add room type tracking for renovation analysis |
|
||||||
|
| Single fact table | Add a second fact table for daily hotel occupancy (snapshot fact) |
|
||||||
|
| No data quality checks in NiFi | Add RouteOnAttribute + dead-letter queue for failed records |
|
||||||
|
| Oracle target is university lab | Package with Oracle XE Docker container for self-contained demo |
|
||||||
451
docs/nifi-flow.md
Normal file
451
docs/nifi-flow.md
Normal file
@@ -0,0 +1,451 @@
|
|||||||
|
# NiFi ETL Flow — Hotel Reservations Data Mart
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The flow moves data from **MySQL 8.4 OLTP** (source) into **Oracle Data Mart** (target).
|
||||||
|
It is organized into **5 Process Groups** that run in sequence, controlled by a top-level scheduler.
|
||||||
|
|
||||||
|
```
|
||||||
|
[PG-1: Date Dim] → [PG-2: Static Dims] → [PG-3: SCD2 Hotel Dim] → [PG-4: SCD1 Guest] → [PG-5: Fact (incremental)]
|
||||||
|
```
|
||||||
|
|
||||||
|
Each PG has a single **Input Port** and **Output Port** so the orchestrator can chain them with connections.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Controller Services (shared by all PGs)
|
||||||
|
|
||||||
|
| Name | Type | Config |
|
||||||
|
|------|------|--------|
|
||||||
|
| `MySQL_DBCPService` | DBCPConnectionPool | Driver: `com.mysql.cj.jdbc.Driver`; URL: `jdbc:mysql://127.0.0.1:13306/hotel_reservations`; User: `root`; Pwd: `hotel2025root` |
|
||||||
|
| `Oracle_DBCPService` | DBCPConnectionPool | Driver: `oracle.jdbc.OracleDriver`; URL: `jdbc:oracle:thin:@<host>:1521:<sid>`; User: `<schema>`; Pwd: `<pwd>` |
|
||||||
|
| `JsonReader` | JsonTreeReader | default settings |
|
||||||
|
| `JsonWriter` | JsonRecordSetWriter | default settings |
|
||||||
|
| `AvroReader` | AvroReader | default settings |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## PG-1: Load Date Dimension
|
||||||
|
|
||||||
|
**Runs once** (or when extending the date range). Populates `DIM_DATE` for 2020–2030.
|
||||||
|
|
||||||
|
```
|
||||||
|
GenerateFlowFile → ExecuteScript → SplitJson → EvaluateJsonPath → PutSQL
|
||||||
|
```
|
||||||
|
|
||||||
|
### Processors
|
||||||
|
|
||||||
|
**GenerateFlowFile**
|
||||||
|
- Run Schedule: manual (run once via right-click → Run Once)
|
||||||
|
- Custom Text: `{}`
|
||||||
|
|
||||||
|
**ExecuteScript** (Groovy)
|
||||||
|
```groovy
|
||||||
|
import groovy.json.JsonOutput
|
||||||
|
import java.time.*
|
||||||
|
|
||||||
|
def rows = []
|
||||||
|
def d = LocalDate.of(2020, 1, 1)
|
||||||
|
def end = LocalDate.of(2030, 12, 31)
|
||||||
|
while (!d.isAfter(end)) {
|
||||||
|
def m = d.monthValue
|
||||||
|
def season = (m >= 6 && m <= 8) ? 'Peak'
|
||||||
|
: (m >= 3 && m <= 5) ? 'High'
|
||||||
|
: (m >= 9 && m <= 11) ? 'Autumn'
|
||||||
|
: 'Winter'
|
||||||
|
rows << [
|
||||||
|
date_key: d.format(java.time.format.DateTimeFormatter.ofPattern('yyyyMMdd')) as int,
|
||||||
|
full_date: d.toString(),
|
||||||
|
year: d.year,
|
||||||
|
quarter: ((m - 1) / 3 + 1) as int,
|
||||||
|
month: m,
|
||||||
|
month_name: d.month.toString().capitalize(),
|
||||||
|
week_number: d.get(java.time.temporal.WeekFields.ISO.weekOfYear()),
|
||||||
|
day_of_month: d.dayOfMonth,
|
||||||
|
day_name: d.dayOfWeek.toString().capitalize(),
|
||||||
|
is_weekend: (d.dayOfWeek.value >= 6) ? 1 : 0,
|
||||||
|
is_business_day: (d.dayOfWeek.value <= 5) ? 1 : 0,
|
||||||
|
season: season
|
||||||
|
]
|
||||||
|
d = d.plusDays(1)
|
||||||
|
}
|
||||||
|
def ff = session.create()
|
||||||
|
ff = session.write(ff, { out -> out.write(JsonOutput.toJson(rows).bytes) } as OutputStreamCallback)
|
||||||
|
ff = session.putAttribute(ff, 'mime.type', 'application/json')
|
||||||
|
session.transfer(ff, REL_SUCCESS)
|
||||||
|
```
|
||||||
|
|
||||||
|
**SplitJson**
|
||||||
|
- JsonPath Expression: `$.*`
|
||||||
|
|
||||||
|
**EvaluateJsonPath**
|
||||||
|
- Destination: `flowfile-attribute`
|
||||||
|
- Attributes: `date_key`, `full_date`, `year`, `quarter`, `month`, `month_name`, `week_number`, `day_of_month`, `day_name`, `is_weekend`, `is_business_day`, `season`
|
||||||
|
|
||||||
|
**PutSQL**
|
||||||
|
- JDBC Connection Pool: `Oracle_DBCPService`
|
||||||
|
- SQL Statement:
|
||||||
|
```sql
|
||||||
|
INSERT INTO DIM_DATE (date_key, full_date, year, quarter, month, month_name,
|
||||||
|
week_number, day_of_month, day_name, is_weekend, is_business_day, season)
|
||||||
|
VALUES (${date_key}, TO_DATE('${full_date}','YYYY-MM-DD'), ${year}, ${quarter},
|
||||||
|
${month}, '${month_name}', ${week_number}, ${day_of_month},
|
||||||
|
'${day_name}', ${is_weekend}, ${is_business_day}, '${season}')
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## PG-2: Static Dimensions (SCD Type 1)
|
||||||
|
|
||||||
|
Loads `DIM_COUNTRY`, `DIM_STAR_RATING`, `DIM_HOTEL_CHAIN` from MySQL.
|
||||||
|
Uses **MERGE INTO** so the flow is idempotent — re-running it updates changed rows and inserts new ones.
|
||||||
|
|
||||||
|
Each sub-flow follows the same pattern:
|
||||||
|
|
||||||
|
```
|
||||||
|
ExecuteSQL(MySQL) → ConvertAvroToJSON → SplitJson → EvaluateJsonPath → PutSQL(MERGE)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2a — DIM_COUNTRY
|
||||||
|
|
||||||
|
**ExecuteSQL** — Connection Pool: `MySQL_DBCPService`
|
||||||
|
```sql
|
||||||
|
SELECT country_id, code, name, currency FROM country ORDER BY country_id
|
||||||
|
```
|
||||||
|
|
||||||
|
**EvaluateJsonPath** attributes: `country_id`, `code`, `name`, `currency`
|
||||||
|
|
||||||
|
**PutSQL**
|
||||||
|
```sql
|
||||||
|
MERGE INTO DIM_COUNTRY tgt
|
||||||
|
USING (SELECT 1 FROM DUAL) src ON (tgt.country_id = ${country_id})
|
||||||
|
WHEN MATCHED THEN
|
||||||
|
UPDATE SET tgt.code = '${code}', tgt.name = '${name}', tgt.currency = '${currency}'
|
||||||
|
WHEN NOT MATCHED THEN
|
||||||
|
INSERT (country_id, code, name, currency)
|
||||||
|
VALUES (${country_id}, '${code}', '${name}', '${currency}')
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2b — DIM_STAR_RATING
|
||||||
|
|
||||||
|
**ExecuteSQL**
|
||||||
|
```sql
|
||||||
|
SELECT star_rating_id, code, description FROM star_rating ORDER BY code
|
||||||
|
```
|
||||||
|
|
||||||
|
**PutSQL**
|
||||||
|
```sql
|
||||||
|
MERGE INTO DIM_STAR_RATING tgt
|
||||||
|
USING (SELECT 1 FROM DUAL) src ON (tgt.star_rating_id = ${star_rating_id})
|
||||||
|
WHEN MATCHED THEN
|
||||||
|
UPDATE SET tgt.code = ${code}, tgt.description = '${description}'
|
||||||
|
WHEN NOT MATCHED THEN
|
||||||
|
INSERT (star_rating_id, code, description)
|
||||||
|
VALUES (${star_rating_id}, ${code}, '${description}')
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2c — DIM_HOTEL_CHAIN
|
||||||
|
|
||||||
|
**ExecuteSQL**
|
||||||
|
```sql
|
||||||
|
SELECT hotel_chain_id, code, name FROM hotel_chain ORDER BY hotel_chain_id
|
||||||
|
```
|
||||||
|
|
||||||
|
**PutSQL**
|
||||||
|
```sql
|
||||||
|
MERGE INTO DIM_HOTEL_CHAIN tgt
|
||||||
|
USING (SELECT 1 FROM DUAL) src ON (tgt.hotel_chain_id = ${hotel_chain_id})
|
||||||
|
WHEN MATCHED THEN
|
||||||
|
UPDATE SET tgt.code = '${code}', tgt.name = '${name}'
|
||||||
|
WHEN NOT MATCHED THEN
|
||||||
|
INSERT (hotel_chain_id, code, name)
|
||||||
|
VALUES (${hotel_chain_id}, '${code}', '${name}')
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## PG-3: DIM_HOTEL — SCD Type 2
|
||||||
|
|
||||||
|
This is the analytically significant dimension. Hotels change star rating and chain affiliation over time (renovations, rebrandings). SCD Type 2 preserves history so reports can accurately show revenue by star category **at the time of booking**, not just today's category.
|
||||||
|
|
||||||
|
**Architecture:** NiFi stages raw data into `STG_HOTEL`, then an `ExecuteScript` runs the SCD2 SQL logic in a single Oracle transaction.
|
||||||
|
|
||||||
|
```
|
||||||
|
[Truncate STG] → [Load STG from MySQL] → [Apply SCD2 SQL]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step A: Truncate staging
|
||||||
|
|
||||||
|
**GenerateFlowFile** → **PutSQL**
|
||||||
|
```sql
|
||||||
|
TRUNCATE TABLE STG_HOTEL
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step B: Load staging from MySQL
|
||||||
|
|
||||||
|
**ExecuteSQL** — `MySQL_DBCPService`
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
h.hotel_id,
|
||||||
|
hc.code AS chain_code,
|
||||||
|
c.code AS country_code,
|
||||||
|
sr.code AS star_code,
|
||||||
|
h.code,
|
||||||
|
h.name,
|
||||||
|
h.city
|
||||||
|
FROM hotel h
|
||||||
|
JOIN country c ON c.country_id = h.country_id
|
||||||
|
JOIN star_rating sr ON sr.star_rating_id = h.star_rating_id
|
||||||
|
LEFT JOIN hotel_chain hc ON hc.hotel_chain_id = h.hotel_chain_id
|
||||||
|
ORDER BY h.hotel_id
|
||||||
|
```
|
||||||
|
|
||||||
|
**ConvertAvroToJSON** → **SplitJson** (`$.*`)
|
||||||
|
|
||||||
|
**EvaluateJsonPath** attributes: `hotel_id`, `chain_code`, `country_code`, `star_code`, `code`, `name`, `city`
|
||||||
|
|
||||||
|
**PutSQL** → `STG_HOTEL`
|
||||||
|
```sql
|
||||||
|
INSERT INTO STG_HOTEL (hotel_id, chain_code, country_code, star_code, code, name, city)
|
||||||
|
VALUES (${hotel_id}, NULLIF('${chain_code}',''), '${country_code}', ${star_code}, '${code}', '${name}', '${city}')
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step C: Apply SCD2 logic
|
||||||
|
|
||||||
|
**GenerateFlowFile** (runs after B finishes) → **ExecuteScript** (Groovy)
|
||||||
|
|
||||||
|
The Groovy script opens a JDBC connection and executes two SQL statements in one transaction:
|
||||||
|
|
||||||
|
```groovy
|
||||||
|
import java.sql.*
|
||||||
|
|
||||||
|
def conn = context.controllerServiceLookup
|
||||||
|
.getControllerService('Oracle_DBCPService_ID')
|
||||||
|
.getConnection()
|
||||||
|
conn.autoCommit = false
|
||||||
|
|
||||||
|
try {
|
||||||
|
// 1. Expire records whose tracked attributes changed
|
||||||
|
conn.prepareStatement("""
|
||||||
|
UPDATE DIM_HOTEL dh
|
||||||
|
SET dh.expiry_date = TRUNC(SYSDATE) - 1,
|
||||||
|
dh.is_current = 0
|
||||||
|
WHERE dh.is_current = 1
|
||||||
|
AND EXISTS (
|
||||||
|
SELECT 1 FROM STG_HOTEL s
|
||||||
|
WHERE s.hotel_id = dh.source_hotel_id
|
||||||
|
AND (
|
||||||
|
NVL(s.chain_code,'~') != NVL((
|
||||||
|
SELECT hc.code FROM DIM_HOTEL_CHAIN hc
|
||||||
|
WHERE hc.hotel_chain_key = dh.hotel_chain_key),'~')
|
||||||
|
OR s.star_code != (
|
||||||
|
SELECT ds.code FROM DIM_STAR_RATING ds
|
||||||
|
WHERE ds.star_rating_key = dh.star_rating_key)
|
||||||
|
OR s.city != dh.city
|
||||||
|
)
|
||||||
|
)
|
||||||
|
""").executeUpdate()
|
||||||
|
|
||||||
|
// 2. Insert new version for changed hotels + insert brand-new hotels
|
||||||
|
conn.prepareStatement("""
|
||||||
|
INSERT INTO DIM_HOTEL (
|
||||||
|
source_hotel_id, hotel_chain_key, country_key, star_rating_key,
|
||||||
|
code, name, city, effective_date, expiry_date, is_current)
|
||||||
|
SELECT
|
||||||
|
s.hotel_id,
|
||||||
|
(SELECT hc.hotel_chain_key FROM DIM_HOTEL_CHAIN hc WHERE hc.code = s.chain_code),
|
||||||
|
(SELECT dc.country_key FROM DIM_COUNTRY dc WHERE dc.code = s.country_code),
|
||||||
|
(SELECT ds.star_rating_key FROM DIM_STAR_RATING ds WHERE ds.code = s.star_code),
|
||||||
|
s.code, s.name, s.city,
|
||||||
|
TRUNC(SYSDATE), NULL, 1
|
||||||
|
FROM STG_HOTEL s
|
||||||
|
WHERE NOT EXISTS (
|
||||||
|
SELECT 1 FROM DIM_HOTEL d
|
||||||
|
WHERE d.source_hotel_id = s.hotel_id
|
||||||
|
AND d.is_current = 1
|
||||||
|
)
|
||||||
|
""").executeUpdate()
|
||||||
|
|
||||||
|
conn.commit()
|
||||||
|
} catch (Exception e) {
|
||||||
|
conn.rollback()
|
||||||
|
throw e
|
||||||
|
} finally {
|
||||||
|
conn.close()
|
||||||
|
}
|
||||||
|
def ff = session.create()
|
||||||
|
session.transfer(ff, REL_SUCCESS)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## PG-4: DIM_GUEST — SCD Type 1
|
||||||
|
|
||||||
|
Guest personal data (city, country) can change without any analytical value in tracking the history. Plain MERGE/upsert is correct here.
|
||||||
|
|
||||||
|
```
|
||||||
|
ExecuteSQL(MySQL) → ConvertAvroToJSON → SplitJson → EvaluateJsonPath → PutSQL(MERGE)
|
||||||
|
```
|
||||||
|
|
||||||
|
**ExecuteSQL** — `MySQL_DBCPService`
|
||||||
|
```sql
|
||||||
|
SELECT g.guest_id, c.code AS country_code, g.name, g.city
|
||||||
|
FROM guest g
|
||||||
|
LEFT JOIN country c ON c.country_id = g.country_id
|
||||||
|
ORDER BY g.guest_id
|
||||||
|
```
|
||||||
|
|
||||||
|
**EvaluateJsonPath** attributes: `guest_id`, `country_code`, `name`, `city`
|
||||||
|
|
||||||
|
**PutSQL**
|
||||||
|
```sql
|
||||||
|
MERGE INTO DIM_GUEST tgt
|
||||||
|
USING (SELECT 1 FROM DUAL) src ON (tgt.guest_id = ${guest_id})
|
||||||
|
WHEN MATCHED THEN
|
||||||
|
UPDATE SET
|
||||||
|
tgt.country_key = (SELECT country_key FROM DIM_COUNTRY WHERE code = NULLIF('${country_code}','')),
|
||||||
|
tgt.name = '${name}',
|
||||||
|
tgt.city = NULLIF('${city}','')
|
||||||
|
WHEN NOT MATCHED THEN
|
||||||
|
INSERT (guest_id, country_key, name, city)
|
||||||
|
VALUES (
|
||||||
|
${guest_id},
|
||||||
|
(SELECT country_key FROM DIM_COUNTRY WHERE code = NULLIF('${country_code}','')),
|
||||||
|
'${name}',
|
||||||
|
NULLIF('${city}','')
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
> **Note:** DIM_ROOM is also SCD Type 1 — load it the same way as DIM_GUEST, joining `hotel_room` with `room_type` in MySQL and MERGEing into `DIM_ROOM` (surrogate key lookup via `source_hotel_id + IS_CURRENT=1` from DIM_HOTEL).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## PG-5: FACT_ROOM_BOOKING — Incremental Load (Watermark)
|
||||||
|
|
||||||
|
The fact table is loaded **incrementally**: only `room_booking` rows with `room_booking_id` greater than the last loaded value are processed. The watermark is stored in `ETL_WATERMARK` in Oracle.
|
||||||
|
|
||||||
|
`source_rb_id` on `FACT_ROOM_BOOKING` has a UNIQUE constraint, so re-running is safe — duplicates are silently skipped.
|
||||||
|
|
||||||
|
```
|
||||||
|
[Read Watermark] → [ExecuteSQL MySQL] → [ConvertAvroToJSON] → [SplitJson]
|
||||||
|
→ [EvaluateJsonPath] → [PutSQL FACT] → [Update Watermark]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step A: Read watermark
|
||||||
|
|
||||||
|
**ExecuteSQL** — `Oracle_DBCPService`
|
||||||
|
```sql
|
||||||
|
SELECT last_key FROM ETL_WATERMARK WHERE entity_name = 'FACT_ROOM_BOOKING'
|
||||||
|
```
|
||||||
|
|
||||||
|
**ConvertAvroToJSON** → **EvaluateJsonPath**
|
||||||
|
- `watermark` ← `$.last_key`
|
||||||
|
|
||||||
|
### Step B: Load from MySQL
|
||||||
|
|
||||||
|
**ExecuteSQL** — `MySQL_DBCPService`
|
||||||
|
SQL Statement (use attribute `${watermark}`):
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
rb.room_booking_id,
|
||||||
|
rb.room_id,
|
||||||
|
rb.date_from,
|
||||||
|
rb.date_to,
|
||||||
|
rb.nightly_rate,
|
||||||
|
rb.total_amount,
|
||||||
|
b.guest_id,
|
||||||
|
b.status AS booking_status,
|
||||||
|
DATEDIFF(rb.date_to, rb.date_from) AS nights_stayed
|
||||||
|
FROM room_booking rb
|
||||||
|
JOIN booking b ON b.booking_id = rb.booking_id
|
||||||
|
WHERE rb.room_booking_id > ${watermark}
|
||||||
|
ORDER BY rb.room_booking_id
|
||||||
|
LIMIT 50000
|
||||||
|
```
|
||||||
|
|
||||||
|
> Set LIMIT to control batch size. Run PG-5 in a loop (using a Timer-driven GenerateFlowFile) until no rows come back.
|
||||||
|
|
||||||
|
### Step C: Split + extract attributes
|
||||||
|
|
||||||
|
**ConvertAvroToJSON** → **SplitJson** (`$.*`)
|
||||||
|
|
||||||
|
**EvaluateJsonPath** attributes:
|
||||||
|
`room_booking_id`, `room_id`, `guest_id`, `date_from`, `date_to`, `nightly_rate`, `total_amount`, `booking_status`, `nights_stayed`
|
||||||
|
|
||||||
|
### Step D: Insert into fact table
|
||||||
|
|
||||||
|
**PutSQL** — `Oracle_DBCPService`
|
||||||
|
```sql
|
||||||
|
INSERT INTO FACT_ROOM_BOOKING (
|
||||||
|
source_rb_id, hotel_key, hotel_chain_key, room_key, guest_key, country_key,
|
||||||
|
star_rating_key, checkin_date_key, checkout_date_key,
|
||||||
|
booking_status, nights_stayed, nightly_rate, total_amount)
|
||||||
|
SELECT
|
||||||
|
${room_booking_id},
|
||||||
|
dh.hotel_key,
|
||||||
|
dh.hotel_chain_key,
|
||||||
|
dr.room_key,
|
||||||
|
dg.guest_key,
|
||||||
|
dg.country_key,
|
||||||
|
dh.star_rating_key,
|
||||||
|
TO_NUMBER(TO_CHAR(TO_DATE('${date_from}','YYYY-MM-DD'), 'YYYYMMDD')),
|
||||||
|
TO_NUMBER(TO_CHAR(TO_DATE('${date_to}', 'YYYY-MM-DD'), 'YYYYMMDD')),
|
||||||
|
'${booking_status}',
|
||||||
|
${nights_stayed},
|
||||||
|
TO_NUMBER('${nightly_rate}', '9999990D99', 'NLS_NUMERIC_CHARACTERS=''.,'''),
|
||||||
|
TO_NUMBER('${total_amount}', '9999990D99', 'NLS_NUMERIC_CHARACTERS=''.,''')
|
||||||
|
FROM
|
||||||
|
DIM_ROOM dr,
|
||||||
|
DIM_GUEST dg,
|
||||||
|
DIM_HOTEL dh
|
||||||
|
WHERE
|
||||||
|
dr.room_id = ${room_id}
|
||||||
|
AND dg.guest_id = ${guest_id}
|
||||||
|
AND dh.hotel_key = dr.hotel_key
|
||||||
|
-- SCD2 lookup: find hotel version active at check-in date
|
||||||
|
AND dh.effective_date <= TO_DATE('${date_from}','YYYY-MM-DD')
|
||||||
|
AND (dh.expiry_date IS NULL OR dh.expiry_date > TO_DATE('${date_from}','YYYY-MM-DD'))
|
||||||
|
-- Idempotent: skip if already loaded
|
||||||
|
AND NOT EXISTS (
|
||||||
|
SELECT 1 FROM FACT_ROOM_BOOKING f WHERE f.source_rb_id = ${room_booking_id}
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
> The `DH.EFFECTIVE_DATE / EXPIRY_DATE` condition is the payoff of SCD Type 2: the fact row always references the hotel dimension version that was true **when the guest actually checked in**, not what the hotel looks like today.
|
||||||
|
|
||||||
|
**Ignore Errors** on PutSQL (route `failure` → funnel) — UNIQUE constraint violations on `source_rb_id` are expected and harmless on re-runs.
|
||||||
|
|
||||||
|
### Step E: Update watermark
|
||||||
|
|
||||||
|
After the PutSQL succeeds, update the watermark with the highest `room_booking_id` seen in this batch.
|
||||||
|
|
||||||
|
**UpdateAttribute**
|
||||||
|
- `max_rb_id` ← `${room_booking_id}` (NiFi Expression Language `max()` across the batch via a custom processor or MergeContent trick)
|
||||||
|
|
||||||
|
> Simplest approach: add a final **ExecuteSQL** that runs after the batch:
|
||||||
|
```sql
|
||||||
|
UPDATE ETL_WATERMARK
|
||||||
|
SET last_key = (SELECT MAX(source_rb_id) FROM FACT_ROOM_BOOKING),
|
||||||
|
last_run_ts = SYSTIMESTAMP
|
||||||
|
WHERE entity_name = 'FACT_ROOM_BOOKING'
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Execution Order & Scheduling
|
||||||
|
|
||||||
|
| PG | Trigger | Frequency |
|
||||||
|
|----|---------|-----------|
|
||||||
|
| PG-1 (Date Dim) | Manual (run once) | — |
|
||||||
|
| PG-2 (Static Dims) | Timer — 24h | Daily |
|
||||||
|
| PG-3 (DIM_HOTEL SCD2) | Timer — 24h | Daily, after PG-2 |
|
||||||
|
| PG-4 (DIM_GUEST SCD1) | Timer — 24h | Daily, after PG-3 |
|
||||||
|
| PG-5 (Fact incremental) | Timer — 1h | Hourly |
|
||||||
|
|
||||||
|
Chain PG-2 → PG-3 → PG-4 by connecting each PG's Output Port to the next PG's Input Port via a **success** relationship.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why SCD Type 2 for DIM_HOTEL?
|
||||||
|
|
||||||
|
A hotel being upgraded from 3★ to 4★ changes its rate tier going forward. If we just overwrite the dimension (SCD1), all historical bookings would suddenly appear to have been made in a 4★ hotel — inflating average revenue per star category in reports. SCD2 preserves the correct picture: every fact row points to the exact hotel version that was true at check-in.
|
||||||
@@ -1,6 +1,7 @@
|
|||||||
#:package MySqlConnector@2.3.7
|
#:package MySqlConnector@2.3.7
|
||||||
|
|
||||||
using System.Text;
|
using System.Text;
|
||||||
|
using System.Globalization;
|
||||||
using MySqlConnector;
|
using MySqlConnector;
|
||||||
|
|
||||||
// ── Config ────────────────────────────────────────────────────────────────────
|
// ── Config ────────────────────────────────────────────────────────────────────
|
||||||
@@ -19,6 +20,7 @@ var rng = new Random(SEED);
|
|||||||
await using var conn = new MySqlConnection(DSN);
|
await using var conn = new MySqlConnection(DSN);
|
||||||
await conn.OpenAsync();
|
await conn.OpenAsync();
|
||||||
Console.WriteLine("Connected.");
|
Console.WriteLine("Connected.");
|
||||||
|
await new MySqlCommand("SET foreign_key_checks=0, unique_checks=0", conn).ExecuteNonQueryAsync();
|
||||||
|
|
||||||
async Task Exec(string sql)
|
async Task Exec(string sql)
|
||||||
{
|
{
|
||||||
@@ -44,7 +46,7 @@ async Task BulkInsert(string table, string columns, List<string> valueTuples)
|
|||||||
}
|
}
|
||||||
|
|
||||||
string S(string? s) => s == null ? "NULL" : $"'{s.Replace("'", "''")}'";
|
string S(string? s) => s == null ? "NULL" : $"'{s.Replace("'", "''")}'";
|
||||||
string N(object? n) => n == null ? "NULL" : n.ToString()!;
|
string N(object? n) => n == null ? "NULL" : (n is IFormattable f) ? f.ToString(null, CultureInfo.InvariantCulture) : n.ToString()!;
|
||||||
string D(DateTime d) => $"'{d:yyyy-MM-dd}'";
|
string D(DateTime d) => $"'{d:yyyy-MM-dd}'";
|
||||||
string DT(DateTime d) => $"'{d:yyyy-MM-dd HH:mm:ss}'";
|
string DT(DateTime d) => $"'{d:yyyy-MM-dd HH:mm:ss}'";
|
||||||
|
|
||||||
@@ -159,7 +161,7 @@ var roomTypes = new (string Code, string Desc, decimal BaseRate, bool Smoking)[]
|
|||||||
};
|
};
|
||||||
|
|
||||||
await BulkInsert("room_type", "code, description, standard_rate, smoking_yn",
|
await BulkInsert("room_type", "code, description, standard_rate, smoking_yn",
|
||||||
roomTypes.Select(rt => $"({S(rt.Code)},{S(rt.Desc)},{rt.BaseRate},0)").ToList());
|
roomTypes.Select(rt => $"({S(rt.Code)},{S(rt.Desc)},{N(rt.BaseRate)},0)").ToList());
|
||||||
|
|
||||||
var roomTypeIds = new Dictionary<string, int>();
|
var roomTypeIds = new Dictionary<string, int>();
|
||||||
{
|
{
|
||||||
@@ -193,7 +195,7 @@ foreach (var rt in roomTypes)
|
|||||||
foreach (var rp in ratePeriods)
|
foreach (var rp in ratePeriods)
|
||||||
{
|
{
|
||||||
var rate = Math.Round(rt.BaseRate * rp.Multiplier, 2);
|
var rate = Math.Round(rt.BaseRate * rp.Multiplier, 2);
|
||||||
prrRows.Add($"({roomTypeIds[rt.Code]},{ratePeriodIds[rp.Code]},{rate})");
|
prrRows.Add($"({roomTypeIds[rt.Code]},{ratePeriodIds[rp.Code]},{N(rate)})");
|
||||||
}
|
}
|
||||||
await BulkInsert("period_room_rate", "room_type_id, rate_period_id, rate", prrRows);
|
await BulkInsert("period_room_rate", "room_type_id, rate_period_id, rate", prrRows);
|
||||||
|
|
||||||
@@ -480,9 +482,8 @@ while (bookingsDone < BOOKING_COUNT)
|
|||||||
bookingRows.Add($"({guestId},{hotelId},{D(checkin)},{D(checkout)},{S(status)},{DT(created)})");
|
bookingRows.Add($"({guestId},{hotelId},{D(checkin)},{D(checkout)},{S(status)},{DT(created)})");
|
||||||
}
|
}
|
||||||
|
|
||||||
// Insert bookings and get the first inserted ID
|
|
||||||
long firstId = await ExecScalar("SELECT AUTO_INCREMENT FROM information_schema.tables WHERE table_schema='hotel_reservations' AND table_name='booking'");
|
|
||||||
await Exec($"INSERT INTO booking (guest_id, hotel_id, date_from, date_to, status, created_at) VALUES {string.Join(',', bookingRows)}");
|
await Exec($"INSERT INTO booking (guest_id, hotel_id, date_from, date_to, status, created_at) VALUES {string.Join(',', bookingRows)}");
|
||||||
|
long firstId = await ExecScalar("SELECT LAST_INSERT_ID()");
|
||||||
|
|
||||||
// Re-derive checkin/nights from the same rng sequence is impossible after the fact,
|
// Re-derive checkin/nights from the same rng sequence is impossible after the fact,
|
||||||
// so re-parse from inserted rows to build room_bookings
|
// so re-parse from inserted rows to build room_bookings
|
||||||
@@ -514,7 +515,7 @@ while (bookingsDone < BOOKING_COUNT)
|
|||||||
int ratePeriodId = monthToRatePeriodId[dfrom.Month];
|
int ratePeriodId = monthToRatePeriodId[dfrom.Month];
|
||||||
decimal nightly = rateMap[(roomTypeId, ratePeriodId)];
|
decimal nightly = rateMap[(roomTypeId, ratePeriodId)];
|
||||||
decimal total = Math.Round(nightly * nights, 2);
|
decimal total = Math.Round(nightly * nights, 2);
|
||||||
roomBookingRows.Add($"({bookingId},{roomId},{D(dfrom)},{D(dto)},{nightly},{total})");
|
roomBookingRows.Add($"({bookingId},{roomId},{D(dfrom)},{D(dto)},{N(nightly)},{N(total)})");
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,133 +1,187 @@
|
|||||||
-- =============================================================================
|
create table ETL_WATERMARK
|
||||||
-- HOTEL RESERVATIONS — DATA MART (STAR SCHEMA)
|
(
|
||||||
-- Target: Oracle (university lab schema)
|
ENTITY_NAME VARCHAR2(50) not null
|
||||||
-- Based on A.24 Revenue Data Mart — Dimensional Modelling by Example
|
constraint PK_ETL_WATERMARK
|
||||||
-- =============================================================================
|
primary key,
|
||||||
|
LAST_KEY NUMBER(20) default 0 not null,
|
||||||
|
LAST_RUN_TS TIMESTAMP(6) default SYSTIMESTAMP
|
||||||
|
)
|
||||||
|
/
|
||||||
|
|
||||||
-- -----------------------------------------------------------------------------
|
create table STG_HOTEL
|
||||||
-- DIMENSION TABLES
|
(
|
||||||
-- -----------------------------------------------------------------------------
|
HOTEL_ID NUMBER(10) not null,
|
||||||
|
HOTEL_CODE VARCHAR2(20) not null,
|
||||||
|
HOTEL_NAME VARCHAR2(150) not null,
|
||||||
|
CITY VARCHAR2(100) not null,
|
||||||
|
COUNTRY_CODE CHAR(2) not null,
|
||||||
|
COUNTRY_NAME VARCHAR2(100) not null,
|
||||||
|
CURRENCY VARCHAR2(10) not null,
|
||||||
|
CHAIN_CODE VARCHAR2(10),
|
||||||
|
CHAIN_NAME VARCHAR2(100),
|
||||||
|
STAR_RATING NUMBER(1) not null,
|
||||||
|
STAR_DESCRIPTION VARCHAR2(20)
|
||||||
|
)
|
||||||
|
/
|
||||||
|
|
||||||
-- YYYYMMDD integer key — cheap date range predicates, no JOIN to calendar needed
|
create table DIM_DATE
|
||||||
CREATE TABLE DIM_DATE (
|
(
|
||||||
date_key NUMBER(8,0) NOT NULL,
|
DATE_KEY NUMBER(8) not null
|
||||||
full_date DATE NOT NULL,
|
constraint PK_DIM_DATE
|
||||||
year NUMBER(4,0) NOT NULL,
|
primary key,
|
||||||
quarter NUMBER(1,0) NOT NULL,
|
FULL_DATE DATE not null,
|
||||||
month NUMBER(2,0) NOT NULL,
|
YEAR NUMBER(4) not null,
|
||||||
month_name VARCHAR2(10) NOT NULL,
|
QUARTER NUMBER(1) not null,
|
||||||
week_number NUMBER(2,0) NOT NULL,
|
MONTH NUMBER(2) not null,
|
||||||
day_of_month NUMBER(2,0) NOT NULL,
|
MONTH_NAME VARCHAR2(10) not null,
|
||||||
day_name VARCHAR2(10) NOT NULL,
|
WEEK_NUMBER NUMBER(2) not null,
|
||||||
is_weekend NUMBER(1,0) NOT NULL,
|
DAY_OF_MONTH NUMBER(2) not null,
|
||||||
is_business_day NUMBER(1,0) NOT NULL,
|
DAY_NAME VARCHAR2(10) not null,
|
||||||
season VARCHAR2(10) NOT NULL, -- Peak / High / Low / Off
|
IS_WEEKEND NUMBER(1) not null
|
||||||
CONSTRAINT pk_dim_date PRIMARY KEY (date_key),
|
constraint CK_DIM_DATE_WEEKEND
|
||||||
CONSTRAINT ck_dim_date_wknd CHECK (is_weekend IN (0,1)),
|
check (is_weekend IN (0, 1)),
|
||||||
CONSTRAINT ck_dim_date_bday CHECK (is_business_day IN (0,1))
|
IS_BUSINESS_DAY NUMBER(1) not null
|
||||||
);
|
constraint CK_DIM_DATE_BUSINESS
|
||||||
|
check (is_business_day IN (0, 1)),
|
||||||
|
SEASON VARCHAR2(10) not null
|
||||||
|
)
|
||||||
|
/
|
||||||
|
|
||||||
CREATE TABLE DIM_COUNTRY (
|
create table DIM_HOTEL
|
||||||
country_key NUMBER(10,0) GENERATED ALWAYS AS IDENTITY,
|
(
|
||||||
country_id NUMBER(10,0) NOT NULL,
|
HOTEL_KEY NUMBER(10) default "IPZ19438"."ISEQ$$_303891".nextval generated as identity
|
||||||
code CHAR(2) NOT NULL,
|
constraint PK_DIM_HOTEL
|
||||||
name VARCHAR2(100) NOT NULL,
|
primary key,
|
||||||
currency VARCHAR2(10) NOT NULL,
|
SOURCE_HOTEL_ID NUMBER(10) not null,
|
||||||
CONSTRAINT pk_dim_country PRIMARY KEY (country_key),
|
HOTEL_CODE VARCHAR2(20) not null,
|
||||||
CONSTRAINT uq_dim_cntry_id UNIQUE (country_id)
|
HOTEL_NAME VARCHAR2(150) not null,
|
||||||
);
|
CITY VARCHAR2(100) not null,
|
||||||
|
COUNTRY_CODE CHAR(2) not null,
|
||||||
|
COUNTRY_NAME VARCHAR2(100) not null,
|
||||||
|
CURRENCY VARCHAR2(10) not null,
|
||||||
|
CHAIN_CODE VARCHAR2(10),
|
||||||
|
CHAIN_NAME VARCHAR2(100),
|
||||||
|
STAR_RATING NUMBER(1) not null,
|
||||||
|
STAR_DESCRIPTION VARCHAR2(20),
|
||||||
|
EFFECTIVE_DATE DATE not null,
|
||||||
|
EXPIRY_DATE DATE,
|
||||||
|
IS_CURRENT NUMBER(1) default 1 not null
|
||||||
|
constraint CK_DIM_HOTEL_CURRENT
|
||||||
|
check (is_current IN (0, 1))
|
||||||
|
)
|
||||||
|
/
|
||||||
|
|
||||||
CREATE TABLE DIM_STAR_RATING (
|
create table DIM_ROOM
|
||||||
star_rating_key NUMBER(10,0) GENERATED ALWAYS AS IDENTITY,
|
(
|
||||||
star_rating_id NUMBER(10,0) NOT NULL,
|
ROOM_KEY NUMBER(10) generated as identity
|
||||||
code NUMBER(1,0) NOT NULL,
|
constraint PK_DIM_ROOM
|
||||||
description VARCHAR2(20) NOT NULL,
|
primary key,
|
||||||
CONSTRAINT pk_dim_star PRIMARY KEY (star_rating_key),
|
SOURCE_ROOM_ID NUMBER(10) not null
|
||||||
CONSTRAINT uq_dim_star_id UNIQUE (star_rating_id)
|
constraint UQ_DIM_ROOM
|
||||||
);
|
unique,
|
||||||
|
HOTEL_KEY NUMBER(10) not null
|
||||||
|
constraint FK_DIM_ROOM_HOTEL
|
||||||
|
references DIM_HOTEL,
|
||||||
|
ROOM_NUMBER VARCHAR2(10) not null,
|
||||||
|
FLOOR NUMBER(3) not null,
|
||||||
|
ROOM_TYPE_CODE VARCHAR2(20) not null,
|
||||||
|
ROOM_TYPE_DESCRIPTION VARCHAR2(100) not null,
|
||||||
|
SMOKING_YN NUMBER(1) not null
|
||||||
|
constraint CK_DIM_ROOM_SMOKING
|
||||||
|
check (smoking_yn IN (0, 1)),
|
||||||
|
STANDARD_RATE NUMBER(10, 2) not null
|
||||||
|
)
|
||||||
|
/
|
||||||
|
|
||||||
CREATE TABLE DIM_HOTEL_CHAIN (
|
create table DIM_GUEST
|
||||||
hotel_chain_key NUMBER(10,0) GENERATED ALWAYS AS IDENTITY,
|
(
|
||||||
hotel_chain_id NUMBER(10,0) NOT NULL,
|
GUEST_KEY NUMBER(10) generated as identity
|
||||||
code VARCHAR2(10) NOT NULL,
|
constraint PK_DIM_GUEST
|
||||||
name VARCHAR2(100) NOT NULL,
|
primary key,
|
||||||
CONSTRAINT pk_dim_chain PRIMARY KEY (hotel_chain_key),
|
SOURCE_GUEST_ID NUMBER(10) not null
|
||||||
CONSTRAINT uq_dim_chain_id UNIQUE (hotel_chain_id)
|
constraint UQ_DIM_GUEST
|
||||||
);
|
unique,
|
||||||
|
GUEST_NAME VARCHAR2(150) not null,
|
||||||
|
CITY VARCHAR2(100),
|
||||||
|
COUNTRY_CODE CHAR(2),
|
||||||
|
COUNTRY_NAME VARCHAR2(100)
|
||||||
|
)
|
||||||
|
/
|
||||||
|
|
||||||
CREATE TABLE DIM_HOTEL (
|
create table FACT_ROOM_BOOKING
|
||||||
hotel_key NUMBER(10,0) GENERATED ALWAYS AS IDENTITY,
|
(
|
||||||
hotel_id NUMBER(10,0) NOT NULL,
|
FACT_ID NUMBER(10) default "IPZ19438"."ISEQ$$_303902".nextval generated as identity
|
||||||
hotel_chain_key NUMBER(10,0),
|
constraint PK_FACT_ROOM_BOOKING
|
||||||
country_key NUMBER(10,0) NOT NULL,
|
primary key,
|
||||||
star_rating_key NUMBER(10,0) NOT NULL,
|
SOURCE_RB_ID NUMBER(10) not null
|
||||||
code VARCHAR2(20) NOT NULL,
|
constraint UQ_FACT_ROOM_BOOKING_SRC
|
||||||
name VARCHAR2(150) NOT NULL,
|
unique,
|
||||||
city VARCHAR2(100) NOT NULL,
|
HOTEL_KEY NUMBER(10) not null
|
||||||
CONSTRAINT pk_dim_hotel PRIMARY KEY (hotel_key),
|
constraint FK_FACT_HOTEL
|
||||||
CONSTRAINT uq_dim_hotel_id UNIQUE (hotel_id),
|
references DIM_HOTEL,
|
||||||
CONSTRAINT fk_dh_chain FOREIGN KEY (hotel_chain_key) REFERENCES DIM_HOTEL_CHAIN (hotel_chain_key),
|
ROOM_KEY NUMBER(10) not null
|
||||||
CONSTRAINT fk_dh_country FOREIGN KEY (country_key) REFERENCES DIM_COUNTRY (country_key),
|
constraint FK_FACT_ROOM
|
||||||
CONSTRAINT fk_dh_star FOREIGN KEY (star_rating_key) REFERENCES DIM_STAR_RATING (star_rating_key)
|
references DIM_ROOM,
|
||||||
);
|
GUEST_KEY NUMBER(10) not null
|
||||||
|
constraint FK_FACT_GUEST
|
||||||
|
references DIM_GUEST,
|
||||||
|
BOOKING_CREATED_DATE_KEY NUMBER(8) not null
|
||||||
|
constraint FK_FACT_BOOKING_DATE
|
||||||
|
references DIM_DATE,
|
||||||
|
CHECKIN_DATE_KEY NUMBER(8) not null
|
||||||
|
constraint FK_FACT_CHECKIN_DATE
|
||||||
|
references DIM_DATE,
|
||||||
|
CHECKOUT_DATE_KEY NUMBER(8) not null
|
||||||
|
constraint FK_FACT_CHECKOUT_DATE
|
||||||
|
references DIM_DATE,
|
||||||
|
BOOKING_STATUS VARCHAR2(20) not null,
|
||||||
|
BOOKING_COUNT NUMBER(1) default 1 not null
|
||||||
|
constraint CK_FACT_BOOKING_COUNT
|
||||||
|
check (booking_count = 1),
|
||||||
|
NIGHTS_STAYED NUMBER(4) not null,
|
||||||
|
NIGHTLY_RATE NUMBER(10, 2) not null,
|
||||||
|
TOTAL_AMOUNT NUMBER(12, 2) not null
|
||||||
|
)
|
||||||
|
/
|
||||||
|
|
||||||
CREATE TABLE DIM_ROOM (
|
create table STG_GUEST
|
||||||
room_key NUMBER(10,0) GENERATED ALWAYS AS IDENTITY,
|
(
|
||||||
room_id NUMBER(10,0) NOT NULL,
|
SOURCE_GUEST_ID NUMBER(10) not null,
|
||||||
hotel_key NUMBER(10,0) NOT NULL,
|
GUEST_NAME VARCHAR2(150) not null,
|
||||||
room_number VARCHAR2(10) NOT NULL,
|
CITY VARCHAR2(100),
|
||||||
floor NUMBER(3,0) NOT NULL,
|
COUNTRY_CODE CHAR(2),
|
||||||
room_type_code VARCHAR2(20) NOT NULL,
|
COUNTRY_NAME VARCHAR2(100)
|
||||||
room_type_desc VARCHAR2(100) NOT NULL,
|
)
|
||||||
smoking_yn NUMBER(1,0) NOT NULL,
|
/
|
||||||
standard_rate NUMBER(10,2) NOT NULL,
|
|
||||||
CONSTRAINT pk_dim_room PRIMARY KEY (room_key),
|
|
||||||
CONSTRAINT uq_dim_room_id UNIQUE (room_id),
|
|
||||||
CONSTRAINT fk_dr_hotel FOREIGN KEY (hotel_key) REFERENCES DIM_HOTEL (hotel_key),
|
|
||||||
CONSTRAINT ck_dim_room_smk CHECK (smoking_yn IN (0,1))
|
|
||||||
);
|
|
||||||
|
|
||||||
CREATE TABLE DIM_GUEST (
|
create table STG_ROOM
|
||||||
guest_key NUMBER(10,0) GENERATED ALWAYS AS IDENTITY,
|
(
|
||||||
guest_id NUMBER(10,0) NOT NULL,
|
SOURCE_ROOM_ID NUMBER(10) not null,
|
||||||
country_key NUMBER(10,0),
|
HOTEL_CODE VARCHAR2(20) not null,
|
||||||
name VARCHAR2(150) NOT NULL,
|
ROOM_NUMBER VARCHAR2(10) not null,
|
||||||
city VARCHAR2(100),
|
FLOOR NUMBER(3) not null,
|
||||||
CONSTRAINT pk_dim_guest PRIMARY KEY (guest_key),
|
ROOM_TYPE_CODE VARCHAR2(20) not null,
|
||||||
CONSTRAINT uq_dim_guest_id UNIQUE (guest_id),
|
ROOM_TYPE_DESCRIPTION VARCHAR2(100) not null,
|
||||||
CONSTRAINT fk_dg_country FOREIGN KEY (country_key) REFERENCES DIM_COUNTRY (country_key)
|
SMOKING_YN NUMBER(1) not null,
|
||||||
);
|
STANDARD_RATE NUMBER(10, 2) not null,
|
||||||
|
HOTEL_ID NUMBER(10)
|
||||||
|
)
|
||||||
|
/
|
||||||
|
|
||||||
-- -----------------------------------------------------------------------------
|
create table STG_ROOM_BOOKING
|
||||||
-- FACT TABLE
|
(
|
||||||
-- -----------------------------------------------------------------------------
|
SOURCE_RB_ID NUMBER(10) not null,
|
||||||
|
GUEST_ID NUMBER(10) not null,
|
||||||
|
BOOKING_CREATED_DATE DATE not null,
|
||||||
|
CHECKIN_DATE DATE not null,
|
||||||
|
CHECKOUT_DATE DATE not null,
|
||||||
|
BOOKING_STATUS VARCHAR2(20) not null,
|
||||||
|
BOOKING_COUNT NUMBER(1) default 1 not null,
|
||||||
|
NIGHTS_STAYED NUMBER(4) not null,
|
||||||
|
NIGHTLY_RATE NUMBER(10, 2) not null,
|
||||||
|
TOTAL_AMOUNT NUMBER(12, 2) not null,
|
||||||
|
HOTEL_ID NUMBER(10) not null,
|
||||||
|
ROOM_ID NUMBER(10) not null
|
||||||
|
)
|
||||||
|
/
|
||||||
|
|
||||||
-- Grain: one row per room_booking
|
|
||||||
-- Revenue measures: nightly_rate, total_amount, nights_stayed
|
|
||||||
CREATE TABLE FACT_ROOM_BOOKING (
|
|
||||||
fact_id NUMBER(10,0) GENERATED ALWAYS AS IDENTITY,
|
|
||||||
-- foreign keys
|
|
||||||
hotel_key NUMBER(10,0) NOT NULL,
|
|
||||||
hotel_chain_key NUMBER(10,0),
|
|
||||||
room_key NUMBER(10,0) NOT NULL,
|
|
||||||
guest_key NUMBER(10,0) NOT NULL,
|
|
||||||
country_key NUMBER(10,0),
|
|
||||||
star_rating_key NUMBER(10,0) NOT NULL,
|
|
||||||
checkin_date_key NUMBER(8,0) NOT NULL,
|
|
||||||
checkout_date_key NUMBER(8,0) NOT NULL,
|
|
||||||
-- degenerate dimensions
|
|
||||||
booking_status VARCHAR2(20) NOT NULL,
|
|
||||||
-- measures
|
|
||||||
nights_stayed NUMBER(4,0) NOT NULL,
|
|
||||||
nightly_rate NUMBER(10,2) NOT NULL,
|
|
||||||
total_amount NUMBER(12,2) NOT NULL,
|
|
||||||
CONSTRAINT pk_fact_rb PRIMARY KEY (fact_id),
|
|
||||||
CONSTRAINT fk_frb_hotel FOREIGN KEY (hotel_key) REFERENCES DIM_HOTEL (hotel_key),
|
|
||||||
CONSTRAINT fk_frb_chain FOREIGN KEY (hotel_chain_key) REFERENCES DIM_HOTEL_CHAIN (hotel_chain_key),
|
|
||||||
CONSTRAINT fk_frb_room FOREIGN KEY (room_key) REFERENCES DIM_ROOM (room_key),
|
|
||||||
CONSTRAINT fk_frb_guest FOREIGN KEY (guest_key) REFERENCES DIM_GUEST (guest_key),
|
|
||||||
CONSTRAINT fk_frb_country FOREIGN KEY (country_key) REFERENCES DIM_COUNTRY (country_key),
|
|
||||||
CONSTRAINT fk_frb_star FOREIGN KEY (star_rating_key) REFERENCES DIM_STAR_RATING (star_rating_key),
|
|
||||||
CONSTRAINT fk_frb_checkin FOREIGN KEY (checkin_date_key) REFERENCES DIM_DATE (date_key),
|
|
||||||
CONSTRAINT fk_frb_checkout FOREIGN KEY (checkout_date_key) REFERENCES DIM_DATE (date_key)
|
|
||||||
);
|
|
||||||
|
|||||||
Reference in New Issue
Block a user