<<Download>> Download Microsoft Word Course Outline Icon Word Version Download PDF Course Outline Icon PDF Version

Updated June 2026

Data Engineering Automation: Ansible, Apache Airflow, and Snowflake

Class Duration

28 hours of live training delivered over 5-10 half-days or 4-5 full days.

Student Prerequisites

  • Practical experience with Python
  • Basic Linux command line skills
  • Familiarity with SQL fundamentals
  • Basic understanding of cloud computing concepts

Target Audience

This course is designed for data engineers, DevOps professionals, and Python developers who want to build automated, production-ready data pipelines. Ideal participants have foundational Python and SQL skills and are looking to expand their toolkit with infrastructure automation, workflow orchestration, and modern cloud data warehousing. Whether you're a DevOps engineer moving into data platforms, a data analyst stepping into engineering, or a developer tasked with building end-to-end pipelines, this course provides the hands-on skills to provision infrastructure with code, schedule and monitor complex workflows, and deliver data to stakeholders through Snowflake. Teams who want to go deeper on the orchestration layer can continue to Apache Airflow Programming: Developing, Configuring, and Automating Workflows.

Description

This course combines three essential technologies for modern data engineering. Ansible provides infrastructure automation and configuration management, ensuring consistent and repeatable environment setup. Apache Airflow enables workflow orchestration, scheduling, and monitoring of complex data pipelines. Snowflake serves as the cloud-native data warehouse destination, offering scalability and performance for analytics workloads. Participants will learn each technology individually before integrating them into cohesive, production-ready data engineering solutions.

Learning Outcomes

  • Provision and configure data engineering infrastructure using Ansible playbooks, roles, and templates, managing inventories of hosts and groups with dynamic inventory for cloud environments.
  • Secure sensitive credentials using Ansible Vault for database connections and API keys.
  • Install and configure Apache Airflow with appropriate executors and database backends.
  • Design and implement DAGs using the Operator API, TaskFlow API, and dynamic task mapping, and monitor and troubleshoot workflows through logs, the Airflow UI, and alerting mechanisms.
  • Navigate Snowflake's architecture and apply best practices for virtual warehouse sizing and cost management.
  • Load data into Snowflake using stages, the COPY command, and Snowpipe for continuous ingestion, then query and transform semi-structured data using Snowflake SQL and window functions.
  • Implement role-based access control to secure Snowflake data assets.
  • Build end-to-end ELT pipelines that extract from source systems, orchestrate with Airflow, and load into Snowflake, applying production best practices including idempotent design, error handling, testing strategies, and performance optimization.

Training Materials

Comprehensive courseware is distributed online at the start of class. All students receive a downloadable MP4 recording of the training.

Software Requirements

Students will need a free, personal GitHub account to access the courseware. Students will need permission to install Python and Visual Studio Code on their computers, along with Python packages and VS Code extensions. A Snowflake trial account will be used for hands-on exercises. If students are unable to configure a local environment, a cloud-based environment can be provided.

Training Topics

Introduction to Ansible

  • Overview and Architecture
  • Benefits of Automation
  • Declarative, Push-based Configuration
  • YAML Syntax Review
  • Command Line Tools

Ansible Setup and Configuration

  • Installing Ansible
  • Configuring Ansible
  • Understanding Configuration Files

Inventory Management

  • Defining Hosts and Groups
  • Working with Inventory Files
  • Dynamic Inventory

Ansible Playbooks

  • Basics of Playbook Writing
  • Variables and Facts
  • Tasks and Handlers
  • Using Modules in Tasks
  • Commonly Used Modules

Working with Variables and Templates

  • Defining and Using Variables
  • Gathering Facts about Hosts
  • Jinja2 Templates in Ansible
  • Template Modules and Variables

Ansible Roles and Best Practices

  • Creating and Using Roles
  • Directory Structure
  • Ansible Galaxy
  • Writing Reusable Code

What is Apache Airflow?

  • Distributed Task Automation
  • Compared to Cron Jobs and Celery
  • Scalability and Reliability
  • Directed Acyclic Graphs (DAGs)
  • Workflows as Code

Airflow Architecture

  • Anatomy of a DAG
  • Operators and Tasks
  • Variables and XComs
  • Providers and Connections
  • Assets and Asset-Based Scheduling
  • Schedulers and Pools

Installation and Configuration

  • Python Virtual Environment
  • Install Airflow with Constraints
  • Standalone Mode
  • API Server, Scheduler, and DAG Processor
  • Database Backend Options

Developing DAGs

  • The airflow.sdk Authoring Interface
  • Operator API
  • TaskFlow API (@dag and @task)
  • Dynamic Task Mapping
  • Templating with Jinja2
  • Task Dependencies

Airflow Configuration

  • Configuration File Options
  • Executor Types (Local, Celery, Kubernetes, Edge)
  • Log Levels and Structure
  • Environment Variables

Monitoring and Debugging

  • The Airflow UI
  • Reviewing Task Logs
  • Notifications and Alerts
  • Troubleshooting Common Issues

Introduction to Snowflake

  • Cloud Data Warehouse Architecture
  • Separation of Storage and Compute
  • Virtual Warehouses
  • Snowflake Editions and Pricing
  • Web Interface Overview

Snowflake Objects and Data Modeling

  • Databases, Schemas, and Tables
  • Views and Materialized Views
  • Data Types and Constraints
  • Clustering Keys
  • Time Travel and Fail-safe

Loading Data into Snowflake

  • Stages (Internal and External)
  • COPY INTO Command
  • File Formats (CSV, JSON, Parquet)
  • Snowpipe for Continuous Loading
  • Data Transformation on Load

Snowflake SQL and Querying

  • Snowflake SQL Extensions
  • Semi-structured Data (VARIANT)
  • Window Functions
  • Query Optimization
  • Query History and Profiling

Security and Access Control

  • Role-Based Access Control (RBAC)
  • Users, Roles, and Privileges
  • Data Sharing
  • Row-Level Security
  • Network Policies

Snowflake Best Practices

  • Warehouse Sizing and Scaling
  • Cost Management
  • Performance Tuning
  • Data Lifecycle Management

Airflow-Snowflake Integration

  • Snowflake Provider for Airflow
  • SQLExecuteQueryOperator (replacing SnowflakeOperator)
  • SnowflakeSqlApiOperator
  • Connection Configuration
  • Building ELT DAGs

End-to-End Pipeline Development

  • Extract from Source Systems
  • Transform with Snowflake SQL
  • Orchestrate with Airflow
  • Error Handling and Retries
  • Idempotent Pipeline Design

Using Ansible for Data Infrastructure

  • Provisioning Airflow Environments
  • Managing Snowflake Users and Roles
  • Configuration Management
  • Secrets Management with Ansible Vault

Advanced Topics

  • Dynamic DAGs and Task Mapping
  • Custom Operators and Hooks
  • Snowflake Tasks and Streams
  • Change Data Capture Patterns
  • Data Quality Checks

Production Considerations

  • CI/CD for Data Pipelines
  • Testing Strategies
  • Monitoring and Alerting
  • Documentation Standards
  • Performance Optimization
<<Download>> Download Microsoft Word Course Outline Icon Word Version Download PDF Course Outline Icon PDF Version