Project Overview
This project focuses on testing Apache Airflow DAGs for Extract, Transform, and Load (ETL) operations between MySQL and PostgreSQL. The pipeline is designed to handle five different tables, ensuring dynamic DAG management with automated scheduling. The workflow extracts data from MySQL, processes it, and loads it into PostgreSQL efficiently.
Key Features:
✅ Dynamic DAG Generation for multiple tables.
✅ Automated Scheduling – Runs every 2 hours between 9 AM and 9 PM on Fridays in week 1 and week 3 of each month.
✅ MySQL to PostgreSQL Data Migration – Ensures smooth data transfer.
✅ Error Handling – Detects missing tables in MySQL.
Project Workflow
Data Extraction:
- The pipeline connects to MySQL and fetches data from five tables:
table_customertable_customer_addresstable_loyalitastable_merchanttable_order
- If a table does not exist, it logs an error.
- The pipeline connects to MySQL and fetches data from five tables:
Data Loading:
- The extracted data is loaded into PostgreSQL, replacing existing records.
- The extracted data is loaded into PostgreSQL, replacing existing records.
Dynamic DAG Execution:
- The workflow is fully parameterized to handle multiple tables within a single DAG.
- Each table has its own Airflow task for extraction and loading.
Scheduling:
- DAGs are scheduled to run every 2 hours within a specific time window.
- The execution occurs on Fridays in the 1st and 3rd weeks of the month at the 15th minute of the hour.
Airflow DAG Structure
start_task→ Triggers the DAG execution.extract_load_<table>→ Extracts data from MySQL and loads it into PostgreSQL.end_task→ Marks the completion of the process.
Graph View (DAG Execution):
✅ Extract 1 – table_customer
✅ Extract 2 – table_customer_address
✅ Extract 3 – table_loyalitas
✅ Extract 4 – table_merchant
✅ Extract 5 – table_order
Code, Queries & Documentation
🔗 Find the complete code, query logic, and documentation on my GitHub:

