# 📚 Maven Toys Dataset Schema Relationship Guide This document outlines the schema relationships and foreign key connections between all CSV files in this directory, suggesting how they can be joined for comprehensive data analysis. ## 🧩 Entity/Dimension Tables (The "Who" and "What") These tables define core entities and are typically used as lookup tables. 1. **`stores`**: Information about the physical retail locations. * **Primary Key (PK):** `Store_ID` 2. **`products`**: Master list of all items sold. * **Primary Key (PK):** `Product_ID` 3. **`calendar`**: Time dimension data for the business. * **Primary Key (PK):** `Date` (Assuming unique dates are recorded) ## 📊 Fact/Snapshot Tables (The "When" and "How Much") These tables record events, measurements, or snapshots in time that link the dimensions together. 1. **`sales`**: The core transaction log. *This is the most frequently joined table.* * **Foreign Keys (FKs):** `Store_ID` (references `stores`), `Product_ID` (references `products`). 2. **`inventory`**: Snapshot of stock levels at a point in time. * **Composite Key/FKs:** (`Store_ID`, `Product_ID`) $\to$ Links to both `stores` and `products`. 3. **`data_dictionary`**: Metadata describing the other fields (Not used for joins, but crucial for understanding column definitions). ## 🗓️ Time Dimension * The **`calendar`** table provides temporal context, which can be joined with `sales` records to analyze performance around holidays or specific periods. --- # 🔗 Relationship Map and Join Paths The following sections show the explicit paths you can use for joining data in SQL or Python (Pandas/DuckDB). ### 1. Sales Analysis Path * **Goal:** Analyzing a transaction's details, location, and item description. * **Join Chain:** `sales` $\to$ (`stores`, `products`) * **Example Join:** `FROM sales s JOIN stores st ON s.Store_ID = st.Store_ID JOIN products p ON s.Product_ID = p.Product_ID;` ### 2. Inventory Valuation Path * **Goal:** Calculating the total value of current stock across all stores. * **Join Chain:** `inventory` $\to$ (`stores`, `products`) * **Example Join:** `FROM inventory i JOIN stores st ON i.Store_ID = st.Store_ID JOIN products p ON i.Product_ID = p.Product_ID;` ### 3. Comprehensive Performance Path (The Full Picture) * **Goal:** Linking sales performance to store location details and calendar dates. * **Join Chain:** `sales` $\to$ (`stores`, `products`, `calendar`) * **Notes:** You can join on the date field from both `sales` and `calendar`. --- # 💡 Example Queries (Ready for Use) These queries demonstrate how to combine the tables. ### 1. Total Revenue Over Time Calculate the total revenue generated month-by-month, showing store performance over time. ```sql SELECT strftime('%Y-%m', s.Date) AS sales_month, -- Grouping by Year and Month st.Store_Name, COUNT(DISTINCT p.Product_ID) AS distinct_products_sold, SUM(s.Units * p.Product_Price) AS total_monthly_revenue FROM sales s JOIN stores st ON s.Store_ID = st.Store_ID JOIN products p ON s.Product_ID = p.Product_ID GROUP BY 1, 2 ORDER BY 1 DESC, total_monthly_revenue DESC; ``` ### 2. Top Performing Product/Category Analysis Identify the best-selling categories and the top 5 specific products by units sold. ```sql SELECT p.Product_Name, p.Product_Category, SUM(s.Units) AS total_units_sold FROM sales s JOIN products p ON s.Product_ID = p.Product_ID GROUP BY 1, 2 ORDER BY total_units_sold DESC LIMIT 5; ``` ### 3. Low Stock Alerts (Inventory Management) List all stores and products where the current stock is below a specified threshold (e.g., < 50 units). ```sql SELECT st.Store_Name, p.Product_Name, i.Stock_On_Hand FROM inventory i JOIN stores st ON i.Store_ID = st.Store_ID JOIN products p ON i.Product_ID = p.Product_ID WHERE i.Stock_On_Hand < 50; ```