Streaming Data Pipeline

Streaming Data Pipeline

This Data Engineering project involves collecting streaming data using Apache Kafka, performing real-time ETL with Apache Spark Streaming, and validating data. The project includes creating a purchasing event producer, building a streaming job to aggregate daily purchases, and outputting cumulative purchase totals using PySpark.

Code, Queries & Documentation

🔗 Find the complete code, query logic, and documentation on my GitHub:

Leave a Comment

Your email address will not be published. Required fields are marked *