This Data Engineering project involves collecting streaming data using Apache Kafka, performing real-time ETL with Apache Spark Streaming, and validating data. The project includes creating a purchasing event producer, building a streaming job to aggregate daily purchases, and outputting cumulative purchase totals using PySpark.
Code, Queries & Documentation
🔗 Find the complete code, query logic, and documentation on my GitHub:

