Apache Airflow is a platform for authoring, scheduling, and monitoring workflows or directed acyclic graphs (DAGs). It allows users to programmatically author DAGs in Python without needing to bundle many XML files. The UI provides a tree view to see DAG runs over time and Gantt charts to see performance trends. Airflow is useful for ETL pipelines, machine learning workflows, and general job scheduling. It handles task dependencies and failures, monitors performance, and enforces service level agreements. Behind the scenes, the scheduler distributes tasks from the metadata database to Celery workers via RabbitMQ.
Related topics: