Defining data pipelines workflows using Apache Airflow
Apache Airflow is a workflow automation and scheduling system that can be used to author and manage data pipelines. Workflows are defined programmatically as directed acyclic graphs (DAG) of tasks, written in Python. At Idealista we use it on a daily basis for data ingestion pipelines. We’ll do a thorough review about managing dependencies, handling retries, alerting, etc. and all the drawbacks.
A Software Engineer based in Spain. Loving Python, Django, Web Scraping and complex data pipelines.