Apache Beam: a smart DataOps and MLOps solution
Data Analytics,Data Science,Data Engineering
Edinburgh
Nowadays, data processing and model deployment are key problems for a company's data health. Efficient data processing guarantees ready-to-use data, the unlocking of new business scenarios, and, of course, the turning of data into business value.
Correct data processing allows easy data versioning and governance, feature store creation and is a powerful real-time deployment tool. In addition, there are multiple possible MLOps strategies for model deployment.
Many companies seek a real time model deployment strategy that can provide immediate answers for customers and will allow data to be analysed at business level. However, each deployment strategy has its own pros and cons, like infrastructural cost, deployment team effort, code and latency requirements.
In this talk, Apache Beam will be shown as a possible solution for all these problems.
Apache Beam is a powerful framework, designed for data processing and model inference. It offers a unified programming model which can define and execute data processing pipeline in an easy, flexible and scalable way.
Core concepts and simple examples of Beam pipeline will be defined in addition to more complicated scenarios which deal with advanced features techniques such as windowing schemes for features creation.
We’ll also take a deep dive into deploying and orchestrating pipelines on Google Cloud Dataflow (a fully managed service that supercharges Beam pipelines with unparallel scalability, and performance).
Finally, we will explore how to deploy models on Apache Beam. Given the current hype on LLMs, we will see how to deploy quantized models in Apache Beam, relying on tensor-frameworks such as GGML, and how Dataflow allows an enhancement of Beam experience with advanced autoscaling and an efficient resource utilization and diagnostic.
This talk will be a starting point for businesses that want to get the best out of their data with a simple approach, without relying on too many pieces of infrastructure to achieve data processing and model deployment.
Speakers Bio - Stefano Bosisio - MLOps engineer at Synthesia
Stefano is an MLOps engineer at Synthesia. Everyday he works along with research scientists and engineers, in order to find all the pain points in the model journey process and simplify their lives. Before Synthesia, Stefano has worked for Trustpilot. Over there, Stefano has brought many advances in the data science context, shaping the very first ML platform for the entire business. Stefano has a PhD in computational chemistry from the University of Edinburgh. Besides data, he loves cooking, crochet, playing the piano and enjoy life with his wife and family.