Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
Friday, May 20 • 10:40am - 11:00am
Real Time Machine Learning Visualization with Spark

Training models on massive datasets, even on Spark, can be a lengthy process, during which the data scientist has no visibility into how the model is shaping up. The only way to monitor progress is to view the status of the Spark jobs, which provides no information about convergence or other statistics of interest. In this talk, we will discuss how to visualize and monitor the training of machine learning models in real-time with Spark. With this capability, you can monitor machine learning training from one iteration to the next, observe how the model converges during each iteration, visualize the characteristics of the model in real time, and decide if you wish to continue to train the model. In this talk you will learn: How machine learning algorithms are monitored by adding callbacks to K-Means and other algorithms. The Spark task communication infrastructure that has been built, using Akka to deliver messages from the Spark driver to the job submitter. How HTML5 SSE helps to generate real-time progress visualizations

avatar for Chester Chen

Chester Chen

Director of Engineering, Alpine Data
Chester Chen is the Director of Engineering and hands on architect at Alpine Data Labs. He manages the analytics platform development as well as contribute to some of the major developments. He has been working with scala on and off since Scala 2.7. He is the founder and organizer of SF Big Analytics Meetup, as well as the main co-organizer of the SF machine learning meetup. Before joining Alpine Data Labs, he had played many roles as Technical... Read More →

