Automating ETL with Stored Procedures in PostgreSQL

Leave a Comment / Business Intelligence, Portfolio / By Hijir

Project Overview

Data management is crucial for businesses aiming to leverage insights for decision-making. This project focuses on designing and implementing an automated ETL (Extract, Transform, Load) process using a stored procedure, dwh.generate_sales(), in PostgreSQL.

Objectives

Automate the ETL process to efficiently extract, transform, and load sales data.
Ensure data integrity by properly structuring and managing transactional, staging, and warehouse tables.
Optimize data flow across different schemas for effective reporting and analysis.

Scope

The ETL process is structured as follows:

Extract: Move data from the public.sales_transaction table to the staging table stg.stg_sales_transaction.
Transform: Organize and load cleaned data into dimension tables (dim_store, dim_product, dim_time, dim_sales_name).
Load: Populate the fact_sales_transaction table and create aggregated reports in data marts (dm_sales_by_store, dm_sales_by_product, dm_sales_by_time, dm_sales_by_salesperson, dm_sales_summary).

ETL Process Breakdown

1. Extract Phase

The stored procedure clears the staging table (stg_sales_transaction) to avoid duplicates.
Data is extracted from public.sales_transaction and inserted into stg.stg_sales_transaction.

2. Transform Phase

Each dimension table is populated with unique and necessary data:
- dim_product: Extracts and inserts product-related details.
- dim_store: Loads store information.
- dim_time: Converts and structures timestamps into year, month, and day.
- dim_sales_name: Organizes salesperson details.
Data cleansing and deduplication are ensured using TRUNCATE and INSERT INTO ... SELECT DISTINCT operations.

3. Load Phase

The fact_sales_transaction table is populated with sales transactions by integrating data from the staging table and dimension tables.
Aggregated data marts (dm_sales_*) provide business intelligence insights by grouping data based on store, product, time, and salesperson.

Project Structure

ETL-Data-Warehouse-Generate-Sales/
├── SQL Query/
│ ├── 1.sales_transaction.sql — Create and populate the transactional database table.
│ ├── 2.stg.sql — Create and populate staging tables.
│ ├── 3.dwh-dim-fact.sql — Create and populate dimension and fact tables.
│ ├── 4.dm.sql — Create and populate data marts.
│ ├── 5.Store Procedure.sql — Implement the stored procedure `generate_sales`.
│ ├── 6.update_data.sql — Add test data and validate ETL updates.
├── ETL Process with Stored Procedure.pdf — Presentation explaining the ETL process.
├── README.md

Execution Steps

Step 1: Create and Populate the Transactional Table

Automating ETL with Stored Procedures in PostgreSQL

Project Overview

Objectives

Scope

ETL Process Breakdown

1. Extract Phase

2. Transform Phase

3. Load Phase

Project Structure

Execution Steps

Step 1: Create and Populate the Transactional Table

Step 2: Create Staging Tables

Step 3: Create Data Warehouse Tables

Step 4: Create Data Marts

Step 5: Execute the Stored Procedure

Step 6: Test with New Data

Key Benefits

Conclusion

Query & Documentation

Leave a Comment Cancel Reply