What Do You Get When You Combine a Snake and A Squirrel? - Building a Python Data Pipeline with Apache Flink

Caito Scherr

Any symbiotic relationship among very different creatures has unique challenges, but can result in something even more powerful than the sum of its parts. Combining Python with Apache Flink, particularly for Machine Learning, has its complications, but can also produce an incredibly fast, portable, scalable, and highly flexible data pipeline.

This talk covers the structure and technical features of a Python-Flink pipeline. It also goes over getting started, and more importantly - addressing the common mistakes and hurdles of building one. This includes which features to use and how to leverage the strengths of each framework based on your specific use case. For instance, when would you use regular Python, and when would you want to use PyFlink? Are there cases where you would NOT want to use some of the abstraction or automation tools available for these frameworks?

Attendees will get out of this talk an introduction to working with Apache Flink with Python, and pragmatic tips and tricks for a smoother, faster, more enjoyable (because this should be fun!) dive into this symbiotic relationship.

This talk is geared towards those who are new to Flink but is applicable to anyone with beginner to advanced Python experience.

About Caito Scherr

Caito is a Developer Advocate for Ververica (creators of Apache Flink), representing the US region, and is based in Portland, Oregon. Previously, she was a software engineer at a data analytics company and loves geeking out about metrics and stream processing. Outside of tech, Caito does woodworking/construction, dance, running, and appreciation of terrible puns.

More Talks