Data By the Bay has ended
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
Back To Schedule
Thursday, May 19 • 10:40am - 11:00am
Predicting Hacker News with Beam and TensorFlow

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Google has just open sources or released two products to the public in the last year, Apache Beam and Cloud Dataflow, that promise to change how we write data pipelines.

Beam is an open source, portable job description framework incubating at the Apache Foundation. It unifies batch and stream processing in a single model available in Java, Python and Scala. It supports running on popular execution engines like Spark, Flink and Google Cloud Dataflow, giving users flexiblity in where they run and eliminating the need to re-write pipelines. One of these execution frameworks, Dataflow, is a cloud-based fully managed service that (like BigQuery) allows users to just submit code and get results. Google provides autoscaling, straggler avoidance and monitoring. 

In this talk we'll explore Beam's event time semantics like windows, sessions, and triggers. Eric will also demonstrate running a single Beam job run in both batch/stream modes and deployed on an on-prem cluster and in the cloud with no code changes.

avatar for Eric Anderson

Eric Anderson

Product Manager, Google
Work on Google Cloud Dataflow.

Thursday May 19, 2016 10:40am - 11:00am PDT