100 lines
3.8 KiB
Markdown
100 lines
3.8 KiB
Markdown
# 📚 Maven Toys Dataset Schema Relationship Guide
|
|
|
|
This document outlines the schema relationships and foreign key connections between all CSV files in this directory, suggesting how they can be joined for comprehensive data analysis.
|
|
|
|
## 🧩 Entity/Dimension Tables (The "Who" and "What")
|
|
|
|
These tables define core entities and are typically used as lookup tables.
|
|
|
|
1. **`stores`**: Information about the physical retail locations.
|
|
* **Primary Key (PK):** `Store_ID`
|
|
2. **`products`**: Master list of all items sold.
|
|
* **Primary Key (PK):** `Product_ID`
|
|
3. **`calendar`**: Time dimension data for the business.
|
|
* **Primary Key (PK):** `Date` (Assuming unique dates are recorded)
|
|
|
|
## 📊 Fact/Snapshot Tables (The "When" and "How Much")
|
|
|
|
These tables record events, measurements, or snapshots in time that link the dimensions together.
|
|
|
|
1. **`sales`**: The core transaction log. *This is the most frequently joined table.*
|
|
* **Foreign Keys (FKs):** `Store_ID` (references `stores`), `Product_ID` (references `products`).
|
|
2. **`inventory`**: Snapshot of stock levels at a point in time.
|
|
* **Composite Key/FKs:** (`Store_ID`, `Product_ID`) $\to$ Links to both `stores` and `products`.
|
|
3. **`data_dictionary`**: Metadata describing the other fields (Not used for joins, but crucial for understanding column definitions).
|
|
|
|
## 🗓️ Time Dimension
|
|
|
|
* The **`calendar`** table provides temporal context, which can be joined with `sales` records to analyze performance around holidays or specific periods.
|
|
|
|
---
|
|
|
|
# 🔗 Relationship Map and Join Paths
|
|
|
|
The following sections show the explicit paths you can use for joining data in SQL or Python (Pandas/DuckDB).
|
|
|
|
### 1. Sales Analysis Path
|
|
* **Goal:** Analyzing a transaction's details, location, and item description.
|
|
* **Join Chain:** `sales` $\to$ (`stores`, `products`)
|
|
* **Example Join:** `FROM sales s JOIN stores st ON s.Store_ID = st.Store_ID JOIN products p ON s.Product_ID = p.Product_ID;`
|
|
|
|
### 2. Inventory Valuation Path
|
|
* **Goal:** Calculating the total value of current stock across all stores.
|
|
* **Join Chain:** `inventory` $\to$ (`stores`, `products`)
|
|
* **Example Join:** `FROM inventory i JOIN stores st ON i.Store_ID = st.Store_ID JOIN products p ON i.Product_ID = p.Product_ID;`
|
|
|
|
### 3. Comprehensive Performance Path (The Full Picture)
|
|
* **Goal:** Linking sales performance to store location details and calendar dates.
|
|
* **Join Chain:** `sales` $\to$ (`stores`, `products`, `calendar`)
|
|
* **Notes:** You can join on the date field from both `sales` and `calendar`.
|
|
|
|
---
|
|
|
|
# 💡 Example Queries (Ready for Use)
|
|
|
|
These queries demonstrate how to combine the tables.
|
|
|
|
### 1. Total Revenue Over Time
|
|
Calculate the total revenue generated month-by-month, showing store performance over time.
|
|
|
|
```sql
|
|
SELECT
|
|
strftime('%Y-%m', s.Date) AS sales_month, -- Grouping by Year and Month
|
|
st.Store_Name,
|
|
COUNT(DISTINCT p.Product_ID) AS distinct_products_sold,
|
|
SUM(s.Units * p.Product_Price) AS total_monthly_revenue
|
|
FROM sales s
|
|
JOIN stores st ON s.Store_ID = st.Store_ID
|
|
JOIN products p ON s.Product_ID = p.Product_ID
|
|
GROUP BY 1, 2
|
|
ORDER BY 1 DESC, total_monthly_revenue DESC;
|
|
```
|
|
|
|
### 2. Top Performing Product/Category Analysis
|
|
Identify the best-selling categories and the top 5 specific products by units sold.
|
|
|
|
```sql
|
|
SELECT
|
|
p.Product_Name,
|
|
p.Product_Category,
|
|
SUM(s.Units) AS total_units_sold
|
|
FROM sales s
|
|
JOIN products p ON s.Product_ID = p.Product_ID
|
|
GROUP BY 1, 2
|
|
ORDER BY total_units_sold DESC
|
|
LIMIT 5;
|
|
```
|
|
|
|
### 3. Low Stock Alerts (Inventory Management)
|
|
List all stores and products where the current stock is below a specified threshold (e.g., < 50 units).
|
|
|
|
```sql
|
|
SELECT
|
|
st.Store_Name,
|
|
p.Product_Name,
|
|
i.Stock_On_Hand
|
|
FROM inventory i
|
|
JOIN stores st ON i.Store_ID = st.Store_ID
|
|
JOIN products p ON i.Product_ID = p.Product_ID
|
|
WHERE i.Stock_On_Hand < 50;
|
|
``` |