Airflow Xcom Exclusive __link__
: Every time a task returns a value, Airflow pushes it to a default XCom key called return_value .
At its simplest, an XCom is a key-value pair identified by a key , a task_id , and a dag_id . By default, when a task returns a value, Airflow automatically serializes that value and writes it into the metadata database (e.g., PostgreSQL or MySQL) in the xcom table.
: Use the XComObjectStorageBackend to store larger data exclusively in S3 or GCS while only keeping a reference in the metadata DB. airflow xcom exclusive
They are meant to store orchestration state and metadata, not actual heavy datasets. The Core Mechanics: How XComs Work Under the Hood
Airflow variables are global and static, while XCom is a dynamic way to share data between tasks. : Every time a task returns a value,
Any PythonOperator that returns a value automatically pushes that value to XCom with the key return_value .
: If you use a custom cloud backend, set an Object Lifecycle Management policy on your S3/GCS bucket to automatically delete XCom files after 14 or 30 days to control cloud storage costs. 5. Summary Cheat Sheet Standard XComs Exclusive Custom Backend XComs Storage Location Airflow Metadata DB ( airflow.db ) External Cloud Storage (S3, GCS, Azure) Size Limit Strict limits (~64KB for Postgres/MySQL text blobs) Virtually unlimited (Gigabytes scale) Performance Impact High risk of DB bloat and UI sluggishness Zero impact on DB transactional performance Best Used For Metadata, operational flags, small string IDs Pandas DataFrames, large JSON strings, heavy logs : Use the XComObjectStorageBackend to store larger data
XCom, short for "cross-communication," is a feature in Airflow that allows tasks to share data with each other. It's a way for tasks to exchange messages, enabling more complex workflows and improving the overall flexibility of your data pipelines. With XCom, you can pass data from one task to another, making it easier to build dynamic and adaptive workflows.
If you want to tailor this implementation to your infrastructure, tell me: Your (AWS, GCP, or Azure)
# Pushing XCom (implicitly via return) def push_task(**context): return "some_value"