Automated ETL Workflow with Airflow – MySQL to PostgreSQL

Project Overview

This project focuses on testing Apache Airflow DAGs for Extract, Transform, and Load (ETL) operations between MySQL and PostgreSQL. The pipeline is designed to handle five different tables, ensuring dynamic DAG management with automated scheduling. The workflow extracts data from MySQL, processes it, and loads it into PostgreSQL efficiently.

Key Features:

Dynamic DAG Generation for multiple tables.
Automated Scheduling – Runs every 2 hours between 9 AM and 9 PM on Fridays in week 1 and week 3 of each month.
MySQL to PostgreSQL Data Migration – Ensures smooth data transfer.
Error Handling – Detects missing tables in MySQL.

Project Workflow

  1. Data Extraction:

    • The pipeline connects to MySQL and fetches data from five tables:
      • table_customer
      • table_customer_address
      • table_loyalitas
      • table_merchant
      • table_order
    • If a table does not exist, it logs an error.

  2. Data Loading:

    • The extracted data is loaded into PostgreSQL, replacing existing records.

  3. Dynamic DAG Execution:

    • The workflow is fully parameterized to handle multiple tables within a single DAG.
    • Each table has its own Airflow task for extraction and loading.

  4. Scheduling:

    • DAGs are scheduled to run every 2 hours within a specific time window.
    • The execution occurs on Fridays in the 1st and 3rd weeks of the month at the 15th minute of the hour.

Airflow DAG Structure

  • start_task → Triggers the DAG execution.
  • extract_load_<table> → Extracts data from MySQL and loads it into PostgreSQL.
  • end_task → Marks the completion of the process.

Graph View (DAG Execution):

Extract 1table_customer
Extract 2table_customer_address
Extract 3table_loyalitas
Extract 4table_merchant
Extract 5table_order

Code, Queries & Documentation

🔗 Find the complete code, query logic, and documentation on my GitHub:

Leave a Comment

Your email address will not be published. Required fields are marked *