Data Engineering Tutorials
Step-by-step walkthroughs with live environments — read, run, learn.
All Tutorials
AIRFLOW TUTORIALApache Airflow for Beginners
Users are complaining about slow file access and we have high disc utilization. What we need to do is reduce IO activity of top offenders using IO priorities…
Open tutorial
AIRFLOWWhat Airflow is, why orchestration, install & first look
Apache Airflow is an open-source platform for orchestrating data pipelines. Learn what it is, why orchestration matters, and familiarize yourself with the UI on a real VM.
Open tutorial
AIRFLOWWrite & run your first DAG on Airflow
You know what Airflow is and its main concepts. Now make it do something. Write a real DAG with tasks, drop it into Airflow, and watch it run to completion.
Open tutorial
AIRFLOWTasks, operators & dependencies in Airflow
Go beyond a single DAG. Learn how operators work, how to set dependencies between tasks, and build a multi-task pipeline with real task ordering.
Open tutorial
AIRFLOWScheduling, intervals & catchup in Airflow
Learn how to schedule DAGs, understand data intervals, and control catchup and backfill behavior in Airflow.
Open tutorial
AIRFLOWXComs and passing data between tasks in Airflow
Tasks in a DAG run in isolation and XComs let them share data. Learn how to pass values between real tasks and inspect them in the UI.
Open tutorial
AIRFLOWSensors & external triggers in Airflow
Sometimes a pipeline has to wait for a file to arrive, an API to respond, another DAG to finish. Sensors are the way how Airflow waits.
Open tutorial
AIRFLOWThe TaskFlow API in Airflow
TaskFlow lets you write DAGs as plain Python functions. Rewrite a real DAG with TaskFlow and see the difference.
Open tutorial
AIRFLOWConnections, hooks and providers in Airflow
DAGs need to talk to real systems such as databases, APIs, S3. Connections store the credentials, hooks let you use them in Python, providers ship pre-built integrations for everything common. Connect a DAG to a real Postgres database on the VM.
Open tutorial
AIRFLOWDynamic DAGs and task mapping in Airflow
Sometimes you don't know how many tasks you'll need until runtime. Task mapping lets Airflow generate those tasks dynamically. You'll fan out real tasks over a list and watch Airflow spawn them.
Open tutorial