Microsoft Azure Big Data Engineering

5 days
uadehmi
Organized by Hexagon MI
5 days

The modern data warehouse

The cloud requires to reconsider some of the choices made for on-premisses data handling. This module introduces the different services in Azure that can be used for data processing, and compares them to the traditional on-premisses data stack. It also provides a brief intro in Azure and the use of the Azure portal.

  • From traditional to modern data warehouse
  • Lambda architecture
  • Overview of Big Data related Azure services
  • Getting started with the Azure Portal
  • LAB: Navigating the Azure Portal

Storing data in Azure

This module discusses the different types of storage available in Azure Storage as well as data lake storage. Also some of the tools to load and manage files in Azure storage and Data lake storage are covered.

  • Introduction Azure Blob Storage
  • Compare Azure Data Lake Storage Gen 2 with traditional blob storage
  • Tools for uploading data
  • Storage Explorer, AZCopy, PolyBase
  • LAB: Uploading data into Azure Storage

Introducing Azure Data Factory

When the data is stored and analysed on on-premisses you typically use ETL tools such as SQL Server Integration Services for this. But what if the data is stored in the Azure cloud? Then you can use Azure Data Factory, the cloud-based ETL service. First we need to get used to the terminology, then we can start creating the proper objects in the portal.

  • Data Factory V2 terminology
  • Setup a Data Factory with GIT support
  • Exploring the Data Factory portal
  • Creating Linked Services and Datasets
  • Copying data with the Data Factory wizard
  • LAB: Migrating data with Data Factory Wizard

Authoring pipelines in Azure Data Factory

This module dives into the process of building a Data Factory pipeline from scratch. The most common activities are illustrated. The module also focusses on how to work with variables and parameters to make the pipelines more dynamic.

  • Adding activities to the pipeline
  • Working with Expressions
  • Variables and Parameters
  • Debugging a pipeline
  • LAB: Authoring and debugging an ADF pipeline

Creating Data Flows in Data Factory

With Data flows data can be transformed without the need to learn about another tool (such as Databricks or Spark). Both Data flows as well as Wrangling Data Flows are covered.

  • From ELT to ETL
  • Creating Data Factory (Mapping) Data flows
  • Exploring Wrangling Data Flows
  • LAB: Transforming data with a Data flow

Data Factory Integration Runtimes

Data Factory needs integration runtimes to control where the code executes. This module walks you through the 3 types of Integration Runtimes: Azure, SSIS and self-hosted runtimes.

  • Integration runtime overview
  • Controling the Azure Integration Runtime
  • Setup self-hosted Integration Runtimes
  • Lift and shift SSIS packages in Data Factory

Deploying and monitoring Data Factory pipelines

Once development has finished the pipelines need to be deployed and scheduled for execution. Monitoring the deployed pipelines for failure, errors or just performance is another crucial topic discussed in this module.

  • Adding triggers to pipelines
  • Deploying pipelines
  • Monitoring pipeline executions
  • Restart failed pipelines
  • LAB: Monitoring pipeline runs

Azure Synapse Analytics

Azure Synapse Analytics is a suite of services aiming at loading, storing and querying large volumes of data. It allows both Spark as well as SQL users interacting with the data.

  • Overview of Azure Synapse Analytics
  • Provisioning an Azure Synapse Analytics Workspace
  • Getting started with Azure Synapse Studio
  • Ingesting data
  • Working with on-demand SQL Pools
  • Using notebooks on Spark Pools
  • LAB: Setting up an Azure Synapse Analytics account

Azure Synapse Analytics Provisioned SQL Pools (Azure Data Warehouse)

Azure SQL Databases have their limitations in compute power since they run on a single machine, and their size is limited to the Terabyte range. Provisioned SQL Pools in Azure Synapse Analytics (formerly known as Azure Data Warehouse) is a service aiming at an analytical workload on data volumes hundreds of times larger than what Azure SQL databases can handle. Yet at the same time we can keep on using the familiar T-SQL query language, or we can connect traditional applications such as Excel and Management Studio to interact with this service. Both storage and compute can be scaled independently.

  • Architecture of Provisioned SQL Pools
  • Loading data via PolyBase
  • CTAS and CETAS
  • Setting up table distributions
  • Indexing
  • Partitioning
  • Performance monitoring and tuning
  • LAB: Loading and querying data in Provisioned SQL Pools

Getting started with Azure Databricks

Azure Databricks allows us to use the power of Spark without the configuration hassle of Hadoop clusters. Using popular languages such as Python, SQL and R data can be loaded, visualized, transformed and analyzed via interactive notebooks.

  • Introduction Azure Databricks
  • Cluster setup
  • Databricks Notebooks
  • Collaborative features in Databricks
  • LAB: Configuring an Azure Databricks account

Accessing data in Azure Databricks

There are many ways to access data in Azure Databricks: From uploading small files via the portal over ad-hoc connections up to mounting Azure Blob storage or data lakes. The files can also be treated as a table, providing easy access. Another point of attention in this module is dealing with malformed input data.

  • Uploading data
  • Connecting to Azure Storage and Data Warehouse
  • Mounting Azure Blob storage
  • Accessing data in an Azure Data Lake Gen 2
  • Dealing with malformed data
  • Processing Spark Dataframes in Python
  • Using Spark SQL
  • Working with Delta Lake
  • LAB: Processing data on an Azure Databricks cluster

Deploying an Azure Databricks solution

Once the Databricks solution has been tested it need to be scheduled for execution. This can be done either with jobs in Azure Databricks or via a Data Factory. In the latter case you need to be able to pass on variables from Data Factory into Databricks. Azure databricks widgets will make this possible.

  • Azure Databricks jobs
  • Working with Databricks Widgets
  • Calling Databricks Notebooks from within Azure Data Factory pipelines
  • LAB: Widgets in Azure DataBricks

Machine learning introduction

Before ML can be applied the key concepts of machine learning need to be discussed.

  • Supervised versus unsupervised learning
  • Machine learning methodology
  • Data preparation
  • Classification, regression and clustering
  • Model evaluation
  • Cognitive services
  • Automated ML in Azure ML Services
  • Working with the Azure ML Designer

Azure Machine Learning Services

Machine learning on a local machine and a small dataset is one thing, running this on larger datasets or more CPU-hungry techniques can become a challenge. Another problem is deploying your model: How can we easily call the resulting model from within other applications? Azure Machine Learning Services helps answering these questions.

  • Azure ML service overview
  • Create a ML service workspace
  • Setting up computes and datastores
  • Creating and querying experiments
  • Deploying and using models
  • Creating and registering images
  • Deploy images as web services
  • LAB: Building ML models in Azure Machine Learning

In this training the modern data warehouse approach to handling any volume of both cloud based as well as on-prem data is explained in detail. First students see how to setup an Azure Data Lake and inject data with Azure Data Factory. Then students learn how to cleanse the data and prepare it for analysis with Azure Synapse Analytics and Azure DataBricks. The Lambda architecture (with focus on both batch data as well as a speed layer where live events are processed) is discussed as well. In the end participants have hands-on experience with the most common Azure services to load, store and process data in the cloud.

This course focusses on developers and administrators who are considering migrating existing data solutions to the Microsoft Azure cloud. Some familiarity with relational database systems such as SQL Server is handy. Prior knowledge of Azure is not required.

Contact Us
  • Address:
    U2U nv/sa
    Z.1. Researchpark 110
    1731 Zellik (Brussels)
    BELGIUM
  • Phone: +32 2 466 00 16
  • Email: info@u2u.be
  • Monday - Friday: 9:00 - 17:00
    Saturday - Sunday: Closed
Say Hi
© 2025 U2U All rights reserved.