How to Share Terabytes of Live Data with Delta Sharing
Technology
In this technical session Databricks' Frank Munz, will introduce Delta Sharing, a Linux Foundation open source solution for sharing massive amounts of live data in a cheap, secure, and scalable way.
Delta sharing uses pandas or Apache Spark for the real-time exchange of large data sets, enabling secure data sharing across products for the first time. It leverages modern cloud object stores, such as S3, ADLS, or GCS, to reliably transfer large data sets. An open-sourced reference sharing service is available to get started.
Within the Databricks ecosystem, Unity Catalog implements the open source Delta Sharing protocol, so you can share data across organization regardless of system boundaries, compute platform or cloud provider.
This is a technical session with difficulty level easy/medium (L300).
Speaker Bio
Dr. Frank Munz is a Staff Developer Advocate at Databricks. He is the published author of three computer science books. Frank built up technical evangelism for Amazon Web Services in Germany, Austria, and Switzerland. He is a certified AWS and GCP Professional Cloud Architect and ML/Data Engineer. Frank fullfilled his dream to speak at top-notch conferences on every continent (except antarctica, because it is too cold there). He presented at conferences such as Devoxx, Kubecon, re:Invent, Voxxed Days, and Java One. He holds a PhD in Computer Science from TU Munich.
When Frank is not working, he enjoys travelling in Southeast Asia, skiing in the Dolomites, tapas in Spain, and scuba diving in Australia.
Related Events
-
Mar 24, 2021
Scaling with technology and data teams in financial services
-
Mar 24, 2021
AI Right? Episode 12 – A year in AI
-
Mar 24, 2021
Navigating the future of data talent - understanding tomorrow’s trends
-
Mar 24, 2021
Citizen data scientists and self-serve data science - over-hyped pipe dream or logical next phase of evolution?