Data Engineering with Azure Synapse Analytics

5 days

UASA

5 days

Upcoming Sessions

Date:

Format:

Price:

Location:

Book now

Date:

Format:

Price:

Location:

Book now

Date:

Format:

Price:

Location:

Book now

Date:

Format:

Price:

Book now

Show fewer Show more

Interested in a private company training? Request it here.

Not ready to book yet? Request an offer here.

The Modern Data Warehouse

The cloud requires to reconsider some of the choices made for on-premisses data handling. This module introduces the concept of a data lake and the data lakehouse. It also introduces the different services in Azure that can be used for data processing and compares them to the traditional on-premisses data stack. Finally, it provides a brief intro in Azure and the use of the Azure portal.

From traditional to modern data warehouse
Comparing data warehouse with data lake
Lambda architecture
Overview of Big Data related Azure services
Getting started with the Azure Portal
LAB: Navigating the Azure Portal

Storing Data in Azure

This module discusses the different types of storage available in Azure Storage as well as data lake storage. Also, some of the tools to load and manage files in Azure storage and Data lake storage are covered.

Introduction Azure Storage
Comparison of Azure Blob Storage vs Azure Data Lake Storage Gen2
Create storage account and containers
Entra ID identification versus key based authentication
Folders and folder level permissions in Data Lake Gen 2
Working with the Azure Storage Explorer and AzCopy
LAB: Uploading data into Azure Storage

Overview of Azure Synapse Analytics

Synapse Analytics is the cornerstone service for the data engineer. It encompasses pipelines to copy data, Spark and SQL to transform and query data, Data Explorer for near real-time analysis and data exploration and Power BI for reporting. This module provides a brief introduction into this service.

The different components of Synapse Analytics
Provisioning a Synapse Analytics workspace
GitHub setup
Navigating the Synapse Analytics Studio
LAB: Provision an Azure Synapse Analytics workspace

Introduction to Azure Synapse Analytics Pipelines

When the data is stored on-premises, you typically use ETL tools such as SQL Server Integration Services for loading and transforming data. But what if the data is stored in the Azure cloud? Then you can use pipelines in Azure Synapse Analytics. This service is nearly identical to Azure Data Factory, the cloud based ETL service. First, we need to get used to the terminology, then we can start creating the proper objects in the portal.

Introducing Azure Synapse Analytics Pipelines
Pipeline terminology
Creating Pipelines, Linked Services and Datasets
Copying data with the Copy Data wizard
LAB: Migrating data with Data Factory Wizard

Authoring Pipelines in Azure Synapse Analytics

This module dives deeper into the process of building an Azure Synapse pipeline. The most common activities are illustrated. The module also focusses on how to work with expressions, variables and parameters to make the pipelines more dynamic.

Adding activities to the pipeline
Working with Expressions
Variables and Parameters
Debugging a pipeline
LAB: Authoring and debugging an ASA pipeline

Authoring Data Flows in Azure Synapse Analytics

With Data flows you can visually design data transformations without the need to learn yet another tool or language (such as Databricks or Spark).

From ELT to ETL
Creating Data flows
Sizing the Spark cluster
Running and Profiling Data flows
LAB: Transforming data using Data Flows

Managing Synapse Pipelines

Pipelines need an integration runtime to control where the code executes. This module provides an overview of the 3 types of Integration Runtimes: Azure, self-hosted runtimes and SSIS. It also discusses the different type of Triggers that exist and how they can be used to schedule pipelines.

Integration Runtime Overview
The Azure Integration Runtime
The Self-Hosted Integration Runtime
Scheduling Pipelines using Triggers
Monitoring pipeline executions

Azure SQL Database

An easy way to create a business intelligence solution in the cloud is by taking SQL Server -- familiar to many Microsoft BI developers -- and run it in the cloud. Backup and high availability happen automatically, and we can use nearly all the skills and tools we used on a local SQL Server on this cloud based solution as well.

Provisioning an Azure SQL Database
Migrating an on-premisses Data Warehouse to Azure SQL Database
Ingesting Azure Blob Storage data
Working with Columnstore Indexes
LAB: Using Azure SQL Databases

Using the Serverless SQL pool in Azure Synapse Analytics

Once data has been loaded into the data lake, the next step is to cleanse the data, pre-aggregate the data and perform other steps to make the data accessible to reporting and analytical tools. Dependant on the transformations required and the skills of the data engineer, the SQL dialect common to the Microsoft data stack (T-SQL) could play an important role. This module first introduces the scenarios where the move from an Azure SQL Database into Synapse databases could be useful, introduces briefly the two different types of SQL databases, and then focusses more deeply on the Synapse Analytics Serverless databases.

When Azure SQL Databases reach their limits
Provisioned versus Serverless Synapse Analytics databases
Creating and accessing Serverless databases
Using OPENROWSET for data access
Creating external tables
LAB: Querying data via Azure Synapse Analytics Serverless databases

Using Azure Synapse Analytics Provisioned SQL Pools

Since Serverless SQL Pools don't store data in a proprietary format, they lack features such as indexes, update statements etc. This is where Provisioned SQL Pools in Azure Synapse Analytics (formerly known as Azure Data Warehouse) can come to the rescue.

Architecture of Provisioned SQL Pools
Loading data via PolyBase
Create Table as Select (CTAS)
Setting up table distributions: Round Robin, Hash and Replicate
Indexing options
LAB: Loading and querying data in Provisioned SQL Pools

Getting started with Apache Spark

Although SQL is a very powerful language to access and manipulate data, it has its limitations. Complex data wrangling, advanced statistics or machine learning are ill-suited tasks for SQL. For this purpose Apache Spark is better suited. It's a divide-and-conquer framework for data access, transformation and querying which relies on programming languages such as Python and Scala.

Introduction to Apache Spark
Spark Cluster setup and configuration
Getting started with Notebooks
LAB: Getting started with Spark and Notebooks

Accessing data in Synapse Analytics Spark

Apache Spark doesn't have a proprietary data storage option, but consumes and produces regular files stored in Azure Storage. This module covers how to access and manipulate data stored in the Synapse Analytics data lake or other Azure storage locations from Synapse Analytics Spark.

The Spark Session and Context objects
Connecting to Azure Blob Storage and Azure Data Lake Storage Gen2
Connecting to relation databases through JDBC or ODBC
Loading and saving data in Spark using DataFrames
An introduction to transforming data using PySpark
Using Spark SQL to query and transform data
LAB: Processing data on a Spark cluster

Introducing Delta Lake

Delta Lake is an optimized storage layer that provides the foundation for storing data and tables in a Lakehouse Platform. Delta lake is an open source platform that extends Parquet files with ACID transactions and metadata handling. This chapter provides an introduction to Delta Lake and how it can be used to create a Lakehouse architecture.

Introducing Delta Lake
Creating Delta Tables
Modifying data in a Delta Table
Time Travel with Delta Tables
Creating a Lake Database/Lakehouse in Azure Synapse Analytics
Querying a Lake Database/Lakehouse using the Serverless SQL Pool

Azure Data Explorer

In between large volumes of historical, long-lived data stored in a data lake, and streams of short living events processed with Azure Stream Analytics, lives the challenge of working with large volumes of semi-structured telemetry and log data, where the analysis can have a longer latency that with event processing, but requires more historical information than what event processing technology can handle. For this kind of data processing Azure Data Explorer is the ideal tool

Data Explorer architecture
Ingesting data in Data Explorer
Querying and visualizing data with the Kusto query language
Accessing Data Explorer from Data Factory and Power BI

The role of the Power BI Service

The Power BI Service (or the Analysis Services engine directly) plays an important role in the modern data warehouse solution. This module describes briefly the Power BI Service architecture and how it integrates with Azure Synapse Analytics.

Overview of Power BI
Power BI/Analysis Services in the modern data warehouse architecture
Link Synapse Analytics with Power BI

Handling large volumes of data requires different skills: One must master storage options, tools to upload data performant, handling failed uploads, and convert data in a format appropriate for reporting and analysis. In the Microsoft Azure stack, Synapse Analytics is the cornerstone service for the data engineer. It encompasses pipelines to copy data, Spark and SQL to transform and query data, Data Explorer for near real-time analysis and data exploration, and Power BI for reporting.

This training teaches how to use Synapse Analytics to design, build and maintain a modern data lake architecture. The training also includes a few other Azure services which come in handy when working with Synapse Analytics, such as Azure Data Vault for handling authentication and Azure SQL Database for dealing with smaller datasets.

This course focusses on developers and administrators who are considering migrating existing data solutions to the Microsoft Azure cloud or start designing new data oriented solutions in the Azure cloud. Some familiarity with relational database systems such as SQL Server is handy. Prior knowledge of Azure is not required.

Developer and IT Training

Data Engineering with Azure Synapse Analytics

UASA

5 days

Upcoming Sessions

The Modern Data Warehouse

Storing Data in Azure

Overview of Azure Synapse Analytics

Introduction to Azure Synapse Analytics Pipelines

Authoring Pipelines in Azure Synapse Analytics

Authoring Data Flows in Azure Synapse Analytics

Managing Synapse Pipelines

Azure SQL Database

Using the Serverless SQL pool in Azure Synapse Analytics

Using Azure Synapse Analytics Provisioned SQL Pools

Getting started with Apache Spark

Accessing data in Synapse Analytics Spark

Introducing Delta Lake

Azure Data Explorer

The role of the Power BI Service

Contact Us

Say Hi