Files
IPZ_1/docs/02-oltp.md
2026-05-17 21:27:42 +02:00

259 lines
7.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# OLTP Database — Design & Details
## Overview
The OLTP (Online Transaction Processing) database models a **hotel reservation system** using a fully normalized relational schema in **MySQL 8.4**. It follows 3NF and enforces referential integrity via foreign keys.
- **Database:** `hotel_reservations`
- **Character set:** `utf8mb4` / `utf8mb4_unicode_ci`
- **Tables:** 13
- **Total rows:** ~635,000
---
## Entity-Relationship Model
The schema covers five entity groups:
```
hotel_chain ──┐
country ───────┼──► hotel ──► hotel_room ──► room_booking ──► booking ──► guest
star_rating ──┘ │
└──► country
hotel_characteristic ◄──► hotel (M:N via hotel_hotel_characteristic)
room_type ◄──── hotel_room
room_type ◄──┐
rate_period ◄─┴── period_room_rate (price per room type per season)
```
---
## Table Descriptions
### Reference / Lookup Tables
#### `hotel_chain`
International hotel chains (Hilton, Marriott, Accor, etc.).
| Column | Type | Description |
|--------|------|-------------|
| `hotel_chain_id` | INT UNSIGNED PK | Surrogate key |
| `code` | VARCHAR(10) UNIQUE | Short code (e.g. `HLT`) |
| `name` | VARCHAR(100) | Full name |
**Rows:** 10
---
#### `country`
Countries from which guests come and where hotels are located.
| Column | Type | Description |
|--------|------|-------------|
| `country_id` | INT UNSIGNED PK | Surrogate key |
| `code` | CHAR(2) UNIQUE | ISO 3166-1 alpha-2 (e.g. `GB`) |
| `name` | VARCHAR(100) | Country name |
| `currency` | VARCHAR(10) | ISO currency code (e.g. `EUR`) |
**Rows:** 40 (Europe, Americas, Asia, Africa, Oceania)
---
#### `star_rating`
Hotel classification from 1★ to 5★.
| Column | Type | Description |
|--------|------|-------------|
| `star_rating_id` | INT UNSIGNED PK | Surrogate key |
| `code` | TINYINT UNIQUE | 15 |
| `description` | VARCHAR(20) | e.g. `3 Star` |
**Rows:** 5
---
#### `hotel_characteristic`
Amenities and features a hotel may offer.
| Column | Type | Description |
|--------|------|-------------|
| `characteristic_id` | INT UNSIGNED PK | Surrogate key |
| `code` | VARCHAR(20) UNIQUE | e.g. `POOL`, `SPA`, `WIFI` |
| `description` | VARCHAR(100) | Human-readable label |
**Rows:** 12 (WiFi, Pool, Gym, Spa, Restaurant, Bar, Parking, Valet, Conference, Shuttle, Room Service, Pet Friendly)
---
#### `room_type`
Types of rooms a hotel can offer, with a standard (base) rate.
| Column | Type | Description |
|--------|------|-------------|
| `room_type_id` | INT UNSIGNED PK | Surrogate key |
| `code` | VARCHAR(20) UNIQUE | e.g. `SINGLE`, `SUITE` |
| `description` | VARCHAR(100) | e.g. `Junior Suite` |
| `standard_rate` | DECIMAL(10,2) | Base nightly rate (EUR) |
| `smoking_yn` | BOOLEAN | Smoking allowed flag |
**Rows:** 7 (Single €80, Double €120, Twin €115, Deluxe €180, Suite €280, Executive €450, Family €200)
---
#### `rate_period`
Seasonal pricing periods. Each period maps to a month range and applies a rate multiplier.
| Column | Type | Description |
|--------|------|-------------|
| `rate_period_id` | INT UNSIGNED PK | Surrogate key |
| `code` | VARCHAR(20) UNIQUE | e.g. `PEAK`, `WINTER` |
| `description` | VARCHAR(50) | Human-readable label |
| `month_from` | TINYINT | Start month (112) |
| `month_to` | TINYINT | End month (112) |
**Rows:** 4
| Code | Period | Months | Multiplier |
|------|--------|--------|-----------|
| PEAK | Peak Season | JunAug | ×1.5 |
| HIGH | High Season | MarMay | ×1.2 |
| AUTUMN | Autumn Season | SepNov | ×1.1 |
| WINTER | Winter Season | DecFeb | ×0.9 |
---
### Junction Tables
#### `period_room_rate`
The effective nightly rate for each (room_type, rate_period) combination.
Rate = `standard_rate × season_multiplier`.
| Column | Type | Description |
|--------|------|-------------|
| `room_type_id` | INT UNSIGNED PK/FK | |
| `rate_period_id` | INT UNSIGNED PK/FK | |
| `rate` | DECIMAL(10,2) | Effective nightly rate |
**Rows:** 28 (7 room types × 4 seasons)
---
#### `hotel_hotel_characteristic`
M:N junction between hotels and their amenities.
| Column | Type |
|--------|------|
| `hotel_id` | INT UNSIGNED PK/FK |
| `characteristic_id` | INT UNSIGNED PK/FK |
**Rows:** ~1,415
---
### Core Entity Tables
#### `hotel`
Individual hotel properties.
| Column | Type | Description |
|--------|------|-------------|
| `hotel_id` | INT UNSIGNED PK | |
| `hotel_chain_id` | INT UNSIGNED FK | NULL for independent hotels |
| `country_id` | INT UNSIGNED FK | |
| `star_rating_id` | INT UNSIGNED FK | |
| `code` | VARCHAR(20) UNIQUE | e.g. `HTL0001` |
| `name` | VARCHAR(150) | |
| `address` | VARCHAR(200) | |
| `postcode` | VARCHAR(20) | |
| `city` | VARCHAR(100) | |
| `url` | VARCHAR(200) | |
**Rows:** 200 (50 cities, star distribution: 5% 1★, 10% 2★, 40% 3★, 30% 4★, 15% 5★)
---
#### `hotel_room`
Individual rooms within each hotel.
| Column | Type | Description |
|--------|------|-------------|
| `room_id` | INT UNSIGNED PK | |
| `hotel_id` | INT UNSIGNED FK | |
| `room_type_id` | INT UNSIGNED FK | |
| `room_number` | VARCHAR(10) | Format: `{floor}{number}`, e.g. `101` |
| `floor` | TINYINT UNSIGNED | |
**Rows:** 5,334 (560 rooms per hotel depending on star rating)
---
#### `guest`
Hotel guests.
| Column | Type | Description |
|--------|------|-------------|
| `guest_id` | INT UNSIGNED PK | |
| `country_id` | INT UNSIGNED FK | Guest's home country |
| `name` | VARCHAR(150) | Full name |
| `email` | VARCHAR(150) | Unique synthetic email |
| `address` | VARCHAR(200) | |
| `city` | VARCHAR(100) | |
**Rows:** 100,000
---
#### `booking`
A reservation made by a guest at a hotel. One booking can cover multiple rooms.
| Column | Type | Description |
|--------|------|-------------|
| `booking_id` | INT UNSIGNED PK | |
| `guest_id` | INT UNSIGNED FK | |
| `hotel_id` | INT UNSIGNED FK | |
| `date_from` | DATE | Check-in |
| `date_to` | DATE | Check-out |
| `status` | ENUM | `confirmed`, `cancelled`, `completed`, `no_show` |
| `created_at` | DATETIME | When booking was made |
**Rows:** 500,000
**Status distribution:** 80% completed, 10% confirmed, 7% cancelled, 3% no_show
**Date range:** 2022-01-01 2025-12-31
**Seasonal distribution:** JuneAugust heaviest (peak), DecemberFebruary lightest
---
#### `room_booking`
A specific room assigned within a booking. Stores the rate **as it was at booking time** (snapshot), independent of any future rate changes.
| Column | Type | Description |
|--------|------|-------------|
| `room_booking_id` | INT UNSIGNED PK | |
| `booking_id` | INT UNSIGNED FK | |
| `room_id` | INT UNSIGNED FK | |
| `date_from` | DATE | |
| `date_to` | DATE | |
| `nightly_rate` | DECIMAL(10,2) | Rate at time of booking |
| `total_amount` | DECIMAL(10,2) | `nightly_rate × nights` |
**Rows:** 531,382
**Room count per booking:** 90% single room, 8% two rooms, 2% three rooms
---
## Data Generation
The database was populated using a **single-file C# script** (`generator/generate.cs`) running on .NET 10, using `MySqlConnector` as the only dependency.
Key generation decisions:
- **Seasonal booking distribution** via rejection sampling — months JunAug are ~2.7× more likely than JanFeb
- **Rate snapshot** — each `room_booking.nightly_rate` is looked up from `period_room_rate` at insert time and stored, not re-computed later
- **Realistic stay lengths** — 30% one night, 25% two nights, 20% three nights, tapering off to 14-night stays
- **Cancelled/no-show bookings** partially skip room assignment (60% of cancellations have no room_booking)
```bash
# Run generator
dotnet run generator/generate.cs
```