Python for Data Engineers: From Syntax to Solutions

3 days
UPDE
3 days

Upcoming Sessions

Date:

Format:

Price:

Location:

Book now

Date:

Format:

Price:

Location:

Book now

Date:

Format:

Price:

Location:

Book now

Date:

Format:

Price:

Book now

Interested in a private company training? Request it here.

Getting Started with Python

Python is a high-level, interpreted, interactive and object-oriented scripting language. This chapter introduces the history of Python and how to install Python and run your first lines of Python Code. There are quite some editors available for writing Python code but this course focusses on using Visual Studio Code as a code editor for Python.

  • Introducing to Python
  • Installing Python
  • Executing Python Code from the Command Shell
  • Python and Visual Studio Code
  • Working with packages in Python
  • Working with Virtual Environments in Python
  • Interactive development in Jupyter notebooks
  • LAB: Installing Python and executing code

Basic Language Constructs in Python

To build code that remains readable and maintainable it is important to be able to break up code in reusable components such as functions and classes.

  • Introduction to writing Python code
  • Declaring and Using Variables
  • Data Types in Python
  • Working with Lists, Tuples, Sequences and Dictionaries
  • Basic Programming Constructs in Python
  • Declaring and executing Functions
  • LAB: Writing basic Python code

Working with Classes and Objects

Python classes provide all the standard features of Object Oriented Programming. Classes can inherit from other base classes, have Constructors for the initialization of objects...

  • Introduction to Object-Oriented Programming
  • Defining and instantiating Classes in Python
  • Working with Constructors
  • Instance and Class Variables
  • Inheritance in Python
  • Working with Access Modifiers
  • LAB: Working with classes and objects

Using and Creating Modules

Modules in Python are reusable code libraries and Python ships with quite a large amount of build-in Modules. Learn how to create and import Modules.

  • Introduction to Modules
  • Importing Modules
  • Creating Modules
  • LAB: Using and creating Modules

Data Processing and Cleansing using Pandas

  • What is Pandas
  • Introducing Pandas Data Structures
  • Reading data with Pandas
  • Indexing in a DataFrame
  • Creating and deleting columns
  • Filtering and Replacing data
  • Sorting and Ranking data
  • Grouping and aggregating data

From Python and Pandas to Apache Spark

With Pandas you typically run code on a single machine. This means that as your data volumes become bigger and bigger, you will be hitting memory and cpu constraints. PySpark is a Spark library written in Python to run Python applications using Apache Spark. Apache Spark is an analytical processing engine for large scale powerful distributed data processing and machine learning applications. In Azure it is available in Azure Synapse Analytics and Azure Databricks.

  • Introducing Apache Spark
  • The SparkSession, SparkContext and SQLContext objects
  • An introduction to Resilient Distributed Datasets (RDD)
  • Convert a Pandas DataFrame to/from a PySpark DataFrame
  • Reading and writing data using DataFrames
  • Working with DataFrames in PySpark
  • Data Cleansing using PySpark
  • Grouping and aggregating data in PySpark
  • Joining DataFrames
  • Using SQL to select and manipulate data

Building a Lakehouse using Delta Lake

  • What Is a Lakehouse?
  • Introduction to Delta Lake
  • Creating tables
  • Partitioning data in tables
  • Reading table data
  • Query older snapshots of a table (Time Travel)
  • Insert, Update, Delete and Merge table data
  • Retrieving table metadata
  • Altering table metadata
  • Configuring Change Data Feed

Python plays a crucial role in data engineering, data science and AI development due to its versatility, extensive libraries such as Pandas and PySpark, and its ability to handle large-scale data processing, making it an indispensable tool for extracting insights and building data pipelines. In this course, participants will gain a solid understanding of Python.

They will acquire the necessary skills and knowledge to utilize Python effectively, from basic syntax to implementing real-world solutions. During the course participants will get hands-on experience with Pandas, PySpark, Delta Lake...

This course is targeted at data engineers, data scientists and AI developers with no or little experience with Python.

Contact Us
  • Address:
    U2U nv/sa
    Z.1. Researchpark 110
    1731 Zellik (Brussels)
    BELGIUM
  • Phone: +32 2 466 00 16
  • Email: info@u2u.be
  • Monday - Friday: 9:00 - 17:00
    Saturday - Sunday: Closed
Say Hi
© 2024 U2U All rights reserved.