3.8 KiB
📚 Maven Toys Dataset Schema Relationship Guide
This document outlines the schema relationships and foreign key connections between all CSV files in this directory, suggesting how they can be joined for comprehensive data analysis.
🧩 Entity/Dimension Tables (The "Who" and "What")
These tables define core entities and are typically used as lookup tables.
stores: Information about the physical retail locations.- Primary Key (PK):
Store_ID
- Primary Key (PK):
products: Master list of all items sold.- Primary Key (PK):
Product_ID
- Primary Key (PK):
calendar: Time dimension data for the business.- Primary Key (PK):
Date(Assuming unique dates are recorded)
- Primary Key (PK):
📊 Fact/Snapshot Tables (The "When" and "How Much")
These tables record events, measurements, or snapshots in time that link the dimensions together.
sales: The core transaction log. This is the most frequently joined table.- Foreign Keys (FKs):
Store_ID(referencesstores),Product_ID(referencesproducts).
- Foreign Keys (FKs):
inventory: Snapshot of stock levels at a point in time.- Composite Key/FKs: (
Store_ID,Product_ID)\toLinks to bothstoresandproducts.
- Composite Key/FKs: (
data_dictionary: Metadata describing the other fields (Not used for joins, but crucial for understanding column definitions).
🗓️ Time Dimension
- The
calendartable provides temporal context, which can be joined withsalesrecords to analyze performance around holidays or specific periods.
🔗 Relationship Map and Join Paths
The following sections show the explicit paths you can use for joining data in SQL or Python (Pandas/DuckDB).
1. Sales Analysis Path
- Goal: Analyzing a transaction's details, location, and item description.
- Join Chain:
sales\to(stores,products) - Example Join:
FROM sales s JOIN stores st ON s.Store_ID = st.Store_ID JOIN products p ON s.Product_ID = p.Product_ID;
2. Inventory Valuation Path
- Goal: Calculating the total value of current stock across all stores.
- Join Chain:
inventory\to(stores,products) - Example Join:
FROM inventory i JOIN stores st ON i.Store_ID = st.Store_ID JOIN products p ON i.Product_ID = p.Product_ID;
3. Comprehensive Performance Path (The Full Picture)
- Goal: Linking sales performance to store location details and calendar dates.
- Join Chain:
sales\to(stores,products,calendar) - Notes: You can join on the date field from both
salesandcalendar.
💡 Example Queries (Ready for Use)
These queries demonstrate how to combine the tables.
1. Total Revenue Over Time
Calculate the total revenue generated month-by-month, showing store performance over time.
SELECT
strftime('%Y-%m', s.Date) AS sales_month, -- Grouping by Year and Month
st.Store_Name,
COUNT(DISTINCT p.Product_ID) AS distinct_products_sold,
SUM(s.Units * p.Product_Price) AS total_monthly_revenue
FROM sales s
JOIN stores st ON s.Store_ID = st.Store_ID
JOIN products p ON s.Product_ID = p.Product_ID
GROUP BY 1, 2
ORDER BY 1 DESC, total_monthly_revenue DESC;
2. Top Performing Product/Category Analysis
Identify the best-selling categories and the top 5 specific products by units sold.
SELECT
p.Product_Name,
p.Product_Category,
SUM(s.Units) AS total_units_sold
FROM sales s
JOIN products p ON s.Product_ID = p.Product_ID
GROUP BY 1, 2
ORDER BY total_units_sold DESC
LIMIT 5;
3. Low Stock Alerts (Inventory Management)
List all stores and products where the current stock is below a specified threshold (e.g., < 50 units).
SELECT
st.Store_Name,
p.Product_Name,
i.Stock_On_Hand
FROM inventory i
JOIN stores st ON i.Store_ID = st.Store_ID
JOIN products p ON i.Product_ID = p.Product_ID
WHERE i.Stock_On_Hand < 50;