fbpx

Data Engineering in Python using Airflow

Master data engineering to automate workflows and transform raw data into valuable insights.

  • Schedule

    28 – 30 August 2024

    18.30 – 21.30 (WIB)

  • Online-Interactive Learning

    Via Zoom

  • Investment

    Rp. 1.500.000

Course Summary

Data engineering is crucial for addressing the challenges of modern data-driven environments. As organizations generate massive amounts of data from various sources, the need for efficient data management and processing becomes critical. Data engineering transforms raw data into valuable insights by designing and implementing robust data pipelines that handle data extraction, transformation, and loading (ETL) processes, ensuring data is clean, and integrated from disparate sources. 

Automation of data workflows is another significant problem solved by data engineering. Manual data processing is time-consuming and error-prone, but tools like Apache Airflow automate complex data workflows, reducing human intervention and minimizing errors. By setting up Directed Acyclic Graphs (DAGs) in Airflow, data engineers can automate the scheduling and execution of tasks, ensuring timely data processing and delivery. This automation enhances operational efficiency, allowing organizations to respond quickly to changing data requirements and business needs especially for data analytics needs.

Learning Outcomes

Upon completion of this workshop, you will be able to:

  • Work with Python and pandas for data cleansing and manipulation processes.
  • Understand basic concepts of ETL (Extract, Transform and Load) in the data engineering field.
  • Learn and implement how to automate and schedule ETL processes using Airflow in Python
  • Learn how to monitor tasks in the Airflow Webserver.

Syllabus

  • Working with Conda Environment
  • Introduction to Python for data science
  • Data manipulation and processing with Python Pandas.
  • Apache Airflow Introduction
  • Directed Acrylic Graph
  • Task and It’s Dependencies
  • Docker Airflow project
  • Create DAG script to automate ETL data pipeline
  • Run and scheduled DAG
  • Monitoring task in Airflow Webserver

Other implementation of Airflow for data analytics

STUDENT TESTIMONIALS

This testimonial video is taken after our previous Online Data Science Series: Time Series Analysis for Business Forecasting.

LEARN FROM ANYWHERE

Our learning format is online-interactive, you will feel the interactive experience as if you were present in a physical classroom. You can access the class using your Zoom account on pre-defined dates.

  • LEARN AT YOUR OWN PACE

    Zoom recording, course Books (PDF & HTML files), the dataset for practice, reference notes, and working files are accessible through our Learning Management System account.

  • PROOF YOUR MASTERY

    Show current and prospective employers of your mastery in computer vision with a signed certificate of completion.

  • CONNECT WITH LIKE MINDED PEOPLE

    Be a part of our data-passionate community with 5000+ members and 1000+ alumni.

FOR ABSOLUTE BEGINNERS

Workshops in this series are tailored to casual programmers and non-programmers that are taking their first steps into data science. It assumes no prior knowledge or academic background, and attendees will be introduced to the beautiful art of writing R / Python code to produce data visualization and build machine learning models. The workshop has a gentle learning slope that is designed with non-technical professionals and academics in mind.

Yes, you can still attend the workshop as it is a beginner-friendly workshop.

Our system will send you an email containing a link and details to join a Google Classroom.

Online learning will be conducted via Zoom.us, Link to join the Zoom Class will be announced via Google Classroom.

Learning materials can be obtain via Google Classroom

Yes, you will receive a certificate of completion.

YOUR INSTRUCTOR

Irfan Chairur Rachman

Irfan Chairur Rachman is a Data Science Instructor at Algoritma Data Science School with a background in informatics engineering. His expertise in automation, data engineering, data analysis, and machine learning led him to create a variety of training courses, including introduction to machine learning and large language models for public classes, data visualization for KPU, PySpark for DBS, and a scorecard for BSI. His main interests are research and creating tools and materials in the field of data science.