Interested in a private company training? Request it here.
The cloud requires to reconsider some of the choices made for on-premisses data handling. This module introduces the concept of a data lake and the data lakehouse. It also introduces the different services in Azure that can be used for data processing, and compares them to the traditional on-premisses data stack. Finally, it provides a brief intro in Azure and the use of the Azure portal.
This module discusses the different types of storage available in Azure Storage as well as data lake storage. Also some of the tools to load and manage files in Azure storage and Data lake storage are covered.
When the data is stored and analyzed on-premises you typically use ETL tools such as SQL Server Integration Services for this. But what if the data is stored in the Azure cloud? Then you can use Azure Data Factory, the cloud-based ETL service. First we need to get used to the terminology, then we can start creating the proper objects in the portal.
This module dives into the process of building a Data Factory pipeline from scratch. The most common activities are illustrated. The module also focusses on how to work with variables and parameters to make the pipelines more dynamic.
With Data flows data can be transformed without the need to learn about another tool (such as Databricks or Spark). Both Data flows as well as the Power Query activity are covered.
Pipelines need an integration runtime to control where the code executes. This module provides an overview of the 3 types of Integration Runtimes: Azure, self-hosted runtimes and SSIS. It also discusses the different type of Triggers that exist and how they can be used to schedule pipelines.
An easy way to create a business intelligence solution in the cloud is by taking SQL Server -- familiar to many Microsoft BI developers -- and run it in the cloud. Backup and high availability happen automatically, and we can use nearly all the skills and tools we used on a local SQL Server on this cloud based solution as well.
Once data has been loaded into the data lake, the next step is to cleanse the data, pre-aggregate the data and perform other steps to make the data accessible to reporting and analytical tools. Dependant on the transformations required and the skills of the data engineer, the SQL dialect common to the Microsoft data stack (T-SQL) could play an important role. This module first introduces the scenarios where the move from an Azure SQL Database into Synapse databases could be useful, introduces briefly the two different types of SQL databases, and then focusses more deeply on the Synapse Analytics Serverless databases.
Since Serverless SQL Pools don't store data in a proprietary format, they lack features such as indexes, update statements etc. This is where Provisioned SQL Pools in Azure Synapse Analytics (formerly known as Azure Data Warehouse) can come to the rescue.
Although SQL is a very powerful language to access and manipulate data, it has its limitations. Complex data wrangling, advanced statistics or machine learning are ill-suited tasks for SQL. For this purpose Apache Spark is better suited. It's a divide-and-conquer framework for data access, transformation and querying which relies on programming languages such as Python and Scala. Spark can be used in Synapse Analytics as well as in Azure Databricks, a popular service which integrates with Synapse Analytics pipelines as well.
Apache Spark doesn't have a proprietary data storage option, but consumes and produces regular files stored in Azure Storage. This module covers how to access and manipulate data stored in the Synapse Analytics data lake or other Azure storage locations from Synapse Analytics Spark or DataBricks.
Delta Lake is an optimized storage layer that provides the foundation for storing data and tables in a Lakehouse Platform. Delta lake is an open source platform that extends Parquet files with ACID transactions and metadata handling. This chapter provides an introduction to Delta Lake and how it can be used to create a Lakehouse architecture.
In between large volumes of historical, long lived data stored in a data lake, and streams of short living events processed with Azure Stream Analytics, lives the challenge of working with large volumes of semi-structured telemetry and log data, where the analysis can have a longer latency that with event processing, but requires more historical information than what event processing technology can handle. For this kind of data processing Azure Data Explorer is the ideal tool
The Power BI Service (or the Analysis Services engine directly) plays an important role in the modern data warehouse solution. This module describes briefly the Power BI Service architecture and how it integrates with Azure Synapse Analytics.
In this training the modern data warehouse approach to handling any volume of both cloud based as well as on-prem data is explained in detail. First students see how to setup an Azure Data Lake and inject data with Azure Data Factory. Then students learn how to cleanse the data and prepare it for analysis with Azure Synapse Analytics and Azure DataBricks. The Lambda architecture (with focus on both batch data as well as a speed layer where live events are processed) is discussed as well, and the speed layer gets illustrated with Azure Data Explorer. In the end participants have hands-on experience with the most common Azure services to load, store and process data in the cloud.
This course focusses on developers and administrators who are considering migrating existing data solutions to the Microsoft Azure cloud. Some familiarity with relational database systems such as SQL Server is handy. Prior knowledge of Azure is not required.