Data Engineering with Azure Databricks

4 days
UADB
4 days

Upcoming Sessions

Date:

Format:

Price:

Location:

Book now

Date:

Format:

Price:

Location:

Book now

Date:

Format:

Price:

Location:

Book now

Date:

Format:

Price:

Book now

Interested in a private company training? Request it here.

The Modern Data Warehouse

The cloud requires to reconsider some of the choices made for on-premises data handling. This module introduces the different services in Azure that can be used for data processing, and compares them to the traditional on-premises data stack. It also provides a brief intro in Azure and the use of the Azure portal.

  • From Data Warehouse to Data Lake
  • Building a Data Lakehouse with Delta Lake
  • The Databricks medallion architecture
  • Big Data storage formats
  • LAB: Navigating the Azure Portal

Getting Started with Azure Databricks

Azure Databricks allows us to use the power of Apache Spark without the configuration hassle of manually creating and configuring Apache Spark clusters. In this chapter you will learn how to setup an Azure Databricks environment and work with Databricks workspaces.

  • Introducing Apache Spark
  • Setup Azure Databricks using the Azure Portal
  • Manage Azure Databricks Workspaces
  • Create a Databricks Cluster
  • Cluster access mode and policies
  • Creating and running your first Notebook
  • Introduction to the Unity Catalog
  • LAB: Getting started with Azure Databricks

Using Notebooks in Azure Databricks

Using popular languages such as Python, SQL and R data can be loaded, visualized, transformed and analyzed via interactive notebooks.

  • The Databricks File System (DBFS)
  • Working with Notebooks in Databricks
  • Magic commands
  • Databricks Utilities
  • Schedule Notebooks
  • Databricks Repos
  • The Databricks Assistant
  • Databricks widgets
  • LAB: Using Notebooks in Azure Databricks

Storing data in Azure

This module discusses the different types of storage available in Azure Storage and how to configure them for Big Data Analytics. Also some of the tools to load and manage files in Azure Storage are covered.

  • Introduction Azure Blob Storage and Azure Data Lake Storage Gen2
  • Working with the Azure Storage Explorer and AzCopy
  • Accessing Azure Storage using Access Keys, SAS Tokens and Service Principals
  • Accessing Azure Storage using Service Principals
  • Cluster Scoped Authentication
  • LAB: Uploading data into Azure Storage

Accessing data in Azure Databricks

There are many ways to access data in Azure Databricks. From uploading small files via the portal over ad-hoc connections up to mounting Azure Storage or data lakes. The files can also be treated as a table, providing easy access.

  • Introduction to Spark DataFrames
  • Reading and writing data using Spark DataFrames
  • Mounting Azure Blob and Data Lake Gen2 Storage
  • Cleaning and Transforming data using the Spark DataFrame API
  • Databases and Tables in Azure Databricks
  • Managed vs Unmanaged Tables in the Hive metastore
  • Tables in the Unity Catalog
  • Scheduling Databricks Jobs
  • LAB: Working with Data in Azure Databricks

Building a Lakehouse using Azure Databricks

Delta Lake is an optimized storage layer that provides the foundation for storing data and tables in a Databricks lakehouse. Learn how to create, query and optimize Delta Tables in a Databricks lakehouse

  • Implementing a Delta Lake
  • Working with Delta Tables
  • Managing Schema change
  • Version and Optimize Delta Tables
  • Data skipping and Z-order
  • Delta Tables and Change Data Feeds
  • Delta Tables and the Unity Catalog
  • Securing Tables in the Unity Catalog
  • LAB: Building a Lakehouse using Delta Tables

Delta Live Tables and Data Pipelines

You can use Databricks for near real-time data ingestion and processing. Most incremental and streaming workloads on Databricks are powered by Structured Streaming, including Delta Live Tables and Auto Loader. The main focus of this chapter is on how you can incrementally load data in a Lakehouse.

  • Structured Streaming
  • Working with Auto Loader
  • Introduction to Delta Live Tables (DLT)
  • Using Python or SQL with DLT
  • Ingesting data using DLT
  • DLT and data quality
  • Configure DLT pipelines
  • LAB: Building a Delta Live Table pipeline

Data Warehousing and Analysis with Databricks SQL

The lakehouse architecture and Databricks SQL Warehouse bring cloud data warehousing capabilities to your data lakes. A SQL warehouse is a compute resource that lets you run SQL commands on objects within Databricks SQL. Learn about the available warehouse types and how to query them.

  • What are SQL Warehouses?
  • Writing queries using the SQL Editor
  • Working with Tables and Views
  • Ingesting Data
  • Visualizing data
  • LAB: Using SQL Warehouses

Databricks and Power BI

Microsoft Power BI is a business analytics tool that provides interactive visualizations with self-service business intelligence capabilities, enabling end users to create reports and dashboards. You can connect Power BI Desktop to your Databricks clusters and Databricks SQL warehouses

  • Power BI Introduction
  • Connect Power BI Desktop to Databricks using Partner Connect
  • Connect Power BI Desktop to Databricks manually
  • LAB: Connection Power BI to Databricks

Databricks is a data analytics platform powered by Apache Spark for data engineering, data science, and machine learning. This training teaches how to use Azure Databricks to design and build a data lakehouse architecture.

No prior knowledge of Azure Databricks is required.

Contact Us
  • Address:
    U2U nv/sa
    Z.1. Researchpark 110
    1731 Zellik (Brussels)
    BELGIUM
  • Phone: +32 2 466 00 16
  • Email: info@u2u.be
  • Monday - Friday: 9:00 - 17:00
    Saturday - Sunday: Closed
Say Hi
© 2024 U2U All rights reserved.