Die im Konferenzprogramm der TDWI München 2023 angegebenen Uhrzeiten entsprechen der Central European Time (CET).
Per Klick auf "VORTRAG MERKEN" innerhalb der Vortragsbeschreibungen kannst du dir deinen eigenen Zeitplan zusammenstellen. Du kannst diesen über das Symbol in der rechten oberen Ecke jederzeit einsehen.
Hier kannst Du die Programmübersicht der TDWI München 2023 mit einem Klick als PDF herunterladen.
This presentation will examine the creation of data products through ETL pipelines and the value they provide through insights. We'll focus on the advantages of test-driven development in ETL and how Azure DevOps and Databricks play key roles in the pipeline. We'll also cover the implementation of a Git-based CICD workflow to automate and guarantee the quality of the generated data products. By the end of the presentation, attendees will have a solid understanding of the benefits of test-driven development in ETL pipelines.
Target Audience: Data Engineer, Data Scientist
Prerequisites: Basic knowledge in ETL processes
Level: Basic
Extended Abstract:
In this presentation, we will discuss the process of generating data products through ETL pipelines to derive valuable insights. The use of test-driven development (TDD) will be a central theme, as we explore how this approach can help to ensure the accuracy and reliability of the data products generated. We will focus on the use of Azure DevOps and Databricks as key components in the pipeline, and how these technologies can be leveraged to provide efficient and streamlined data processing.
Additionally, we will explore the implementation of a Continuous Integration and Continuous Deployment (CICD) workflow using Git, which automates the development and deployment process, allowing for rapid iterations and reduced downtime. The use of Git also provides a reliable mechanism for version control and collaboration between team members.
One of the key highlights of the presentation will be the use of a test data generator powered by artificial intelligence. This tool will be used to generate realistic test data for use in the TDD process, ensuring that the pipeline code is thoroughly tested and ready for production deployment. This will be an important factor in ensuring the accuracy and reliability of the data products generated.
By the end of the presentation, attendees will have a comprehensive understanding of the benefits of TDD in ETL pipelines and how it can be used to produce high-quality data products that provide valuable insights. They will also gain an understanding of how Azure DevOps, Databricks, and Git can be used in conjunction with TDD to create a streamlined and efficient data processing workflow.
Jannik Wiessler is a Data Scientist and Data Engineer who has spent four years working for Daimler Truck AG. With an academic background in engineering, he also teaches computer science and programming in Python and C at DHBW Stuttgart. His experience and expertise in data analytics, machine learning, and programming have made a significant impact on the organization's success. Jannik is passionate about utilizing technology to drive innovation and efficiency, and he has a strong desire to share his knowledge with the next generation of engineers and data scientists.