Need a private training for your team? Request a private training
Not ready to book yet? Request an offer
Python is a high-level, interpreted, interactive and object-oriented scripting language. This chapter introduces Python, including how to install it and run your first lines of code. There are quite some editors available for writing Python code but this course focusses on using Visual Studio Code. We'll also cover modern Python tooling including uv, a fast Python package installer and project manager.
We explore programming in Python by discussing some basic syntax, variables and conditional statements.
Collections allow you to store and organize data efficiently, making it easier to handle. Loops help you repeat actions on these collections.
We explore how to structure reusable code, handle unexpected situations and manage resources.
Python classes provide all the standard features of Object Oriented Programming: they can inherit from (multiple) other base classes, leverage modern Python features like dataclasses and context managers, ...
Modules in Python are reusable code libraries and Python ships with quite a large amount of built-in Modules. We learn how to import them and create our own Modules.
You do not need to reinvent the wheel when coding in Python. Its Standard Library offers a rich collection of built-in modules that simplify common tasks, while external libraries provide specialized tools for modern development.
Pydantic is a powerful Python library that uses Python type annotations to validate data and settings management. It provides runtime type checking and automatic data conversion, making it essential for building robust data pipelines and APIs. This chapter covers how to define data models, validate complex data structures, and handle validation errors effectively.
Testing is a critical aspect of software development that ensures code reliability and maintainability. Python provides excellent testing frameworks, with pytest being the most popular choice for its simplicity and powerful features. This chapter covers writing effective unit tests, mocking dependencies, and implementing test-driven development practices for data engineering and app development.
Pandas is a Python library which makes loading and transforming data a lot easier. As long as all your data fits in memory, Pandas is your friend.
Data visualization is a critical skill for data scientists and engineers to communicate insights and identify patterns or anomalies in data. This chapter explores the Python visualization ecosystem, starting from basic plotting in Matplotlib and Pandas, moving to the sophisticated statistical aesthetics of Seaborn, and concluding with interactive, web-ready visualizations using Plotly.
Data lakes allows storing large data volumes in their original format, but Pandas doesn’t scale well. Apache Spark enables distributed processing, and PySpark brings it to Python (available in Micorsoft Fabric, Azure Synapse Analytics and Databricks).
Python is a key technology in data engineering, data science and AI development thanks to its versatility and powerful ecosystem, including libraries such as Pandas and PySpark for large-scale data processing.
In this course, you will build a solid foundation in Python and learn how to apply it in real-world data scenarios. You will progress from core language concepts to practical implementation, including data processing, validation and visualization.
Through hands-on exercises, you will gain experience with tools such as Pydantic, Pandas, Seaborn, and PySpark, enabling you to efficiently work with data and build robust data-driven solutions.
This course is targeted at data engineers, data scientists and AI developers with no or little experience with Python. Familiarity with programming in general might come in handy.