AIRFLOW Scheduling, intervals & catchup in Airflow
Learn how to schedule DAGs, understand data intervals, and control catchup and backfill behavior in Airflow.
What we're doing
You know how to write and run a DAG. Now learn how to make it run automatically — on a schedule, at the right time, with full control over what happens when it misses a run. Get to know how scheduling works, what data intervals are, and how catchup and backfill behave. Then you'll write a scheduled DAG and control it from the UI.
Step 1: How scheduling works
Every DAG has a schedule which tells Airflow when to trigger a run. You define it with the schedule parameter in the DAG block.
Airflow supports two ways to define a schedule:
Preset shortcuts
schedule="@hourly" # every hour
schedule="@daily" # every day at midnight
schedule="@weekly" # every Monday at midnight
schedule="@monthly" # first day of every month
schedule=None # never runs automatically, manual trigger only
Cron expressions
A cron expression gives you full control over the schedule. It has five fields:
- day of week (0-6, Sunday=0)
- month (1-12)
- day of month (1-31)
- hour (0-23)
- minute (0-59)
Some examples:
schedule="0 9 * * *" # every day at 9am
schedule="0 9 * * 1" # every Monday at 9am
schedule="0 */6 * * *" # every 6 hours
schedule="30 8 1 * *" # first day of every month at 8:30am
* means "every". 0 9 * * * reads as: at minute 0, hour 9, every day, every month, every day of the week.
Step 2: Data intervals
A data interval is the window of time that the DAG run is processing.
For a daily DAG scheduled at midnight:
- The run that fires at 2024-01-02 00:00 processes data from 2024-01-01 00:00 to 2024-01-02 00:00
data_interval_start= 2024-01-01 00:00data_interval_end= 2024-01-02 00:00
This is important. Airflow runs at the end of the interval, not the beginning. A daily DAG with start_date=2024-01-01 doesn't fire on January 1st — it fires on January 2nd, processing January 1st's data.
You can access the interval in your tasks:
def extract(**context):
start = context["data_interval_start"]
end = context["data_interval_end"]
print(f"Processing data from {start} to {end}")
Step 3: Catchup and backfill
Catchup
When you create a DAG with a start_date in the past and catchup=True, Airflow will automatically run all the missed intervals between the start date and now.
with DAG(
dag_id="my_dag",
start_date=datetime(2024, 1, 1),
schedule="@daily",
catchup=True # Airflow will backfill all missed runs
) as dag:
Most of the time you want catchup=False, especially when you're developing or when past data doesn't matter.
catchup=False # only run going forward, ignore missed runs
Backfill
Backfill is the manual version of catchup. You trigger it from the command line for a specific date range:
airflow dags backfill \
--start-date 2024-01-01 \
--end-date 2024-01-31 \
my_dag
catchup=True - always turn it on, because with a start_date from a year ago Airflow will queue hundreds of runs immediately.
Step 4: Create the DAG file
Click VS Code in the environment panel. Right click on the dags folder and create a new file called scheduled_dag.py.
Step 5: Write the scheduled DAG
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
def process_data(**context):
start = context["data_interval_start"]
end = context["data_interval_end"]
print(f"Processing data from {start} to {end}")
with DAG(
dag_id="scheduled_dag",
start_date=datetime(2024, 1, 1),
schedule="0 9 * * *",
catchup=False
) as dag:
process = PythonOperator(
task_id="process",
python_callable=process_data
)
schedule="0 9 * * *"— runs every day at 9amcatchup=False— only runs going forward**context— Airflow passes a context dictionary to every task function. It contains information about the current run including the data intervalcontext["data_interval_start"]andcontext["data_interval_end"]— the start and end of the data window this run is processing
Save with Ctrl+S.
Step 6: Find it in the UI and inspect the schedule
Open the Airflow UI from the environment panel. Go to the DAGs page and find scheduled_dag. You'll see the schedule showing 0 9 * * * and the next scheduled run time.
Click into the DAG and open the Details tab. Here you can see:
- The schedule interval
- The next run time
- The last run time
- The data interval of the last run
After hibernation
If the VM hibernates, reconnect and run in the VS Code terminal:
cd ~/airflow
docker compose up -d
What's next
Start Airflow