AIRFLOW Tasks, operators & dependencies in Airflow
Go beyond a single DAG. Learn how operators work, how to set dependencies between tasks, and build a multi-task pipeline with real task ordering.
What we're doing
You'll learn what operators are, how dependencies work, and build a multi-task DAG that runs tasks in parallel. Watch the video first, then follow along here.
Step 1: Three things you need to know
Before writing any code, three concepts:
Operators — an operator defines what a task does. Every task in a DAG is built from an operator. PythonOperator runs a Python function. BashOperator runs a shell command.
Dependencies — dependencies tell Airflow what order to run tasks in. t1 >> t2 means t1 runs first, then t2. t1 >> [t2, t3] means t2 and t3 run in parallel after t1.
Task — a single unit of work inside a DAG. Each task uses an operator and has a unique task_id that shows up in the graph view.
Operator = what to do. Dependency = when to do it. Task = the combination of both.
Step 2: Create the DAG file
Click VS Code in the environment panel. Once it opens you'll see the dags folder in the file explorer on the left. Right click on it and choose New File. Name it pipeline.py.
Step 3: Add the imports
from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.operators.bash import BashOperator
from airflow.operators.empty import EmptyOperator
from datetime import datetime
DAG— the object that defines your pipelinePythonOperator— runs a Python functionBashOperator— runs a shell commandEmptyOperator— does nothing, used as a start or end markerdatetime— used to set the start date of the DAG
Step 4: Define the functions
def validate_data():
print("Validating incoming data...")
def transform_data():
print("Transforming data...")
def send_report():
print("Sending report...")
The print statements are placeholders so you can see something in the logs when the tasks run.
Step 5: Define the DAG
with DAG(
dag_id="pipeline",
start_date=datetime(2024, 1, 1),
schedule="@daily",
catchup=False
) as dag:
dag_id— the name that appears in the Airflow UIstart_date— when this DAG became activeschedule="@daily"— run once a day automaticallycatchup=False— won't backfill all the missed runs since the start date
Step 6: Create the tasks
start = EmptyOperator(task_id="start")
validate = PythonOperator(
task_id="validate",
python_callable=validate_data
)
check_logs = BashOperator(
task_id="check_logs",
bash_command="echo 'Checking system logs...'"
)
transform = PythonOperator(
task_id="transform",
python_callable=transform_data
)
report = PythonOperator(
task_id="report",
python_callable=send_report
)
end = EmptyOperator(task_id="end")
startandend— EmptyOperators used as markers to make the graph cleanervalidate— a PythonOperator wrapping the validate_data functioncheck_logs— a BashOperator running a shell command directly, no Python function neededtransformandreport— PythonOperators wrapping their respective functions
Step 7: Set the dependencies and save the file
start >> [validate, check_logs] >> transform >> report >> end
startfires first[validate, check_logs]— both run in parallel after start, they're independent so no point running them one after the othertransform— waits for both validate and check_logs to finish before startingreport— runs after transformend— final marker
Save the file using Ctrl+S.
Step 9: Find it in the UI, trigger and watch it run
Open the Airflow UI from the environment panel. Go to the DAGs page and look for pipeline in the list.
Click the play button on the right of pipeline to trigger it manually. Click into the DAG and open the Graph view. Watch the tasks change color:
- White — waiting
- Yellow — running
- Green — success
- Red — failed
Notice how validate and check_logs turn yellow at the same time — that's the parallel execution. Once both are green, transform starts. Then report, then end.
Once all tasks are green, click any task → Logs to see its output.
After hibernation
If the VM hibernates, reconnect and run in the VS Code terminal:
cd ~/airflow
docker compose up -d
What's next
Start Airflow