Data By the Bay has ended
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
Back To Schedule
Monday, May 16 • 2:10pm - 2:50pm
The Lego Model for Machine Learning

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

80-90% of data science is data cleaning and feature engineering. However, if we were to plot a count of what all the data science tools are for, we would find that most innovation happens in data infrastructure and modeling. We want to change that and make data scientists much more productive while also improving the quality of their work. In this talk I will describe the machine learning platform we wrote on top of spark to modularize these steps. This allows easy reuse of components, simplifying model building and changes. The framework simplifies the data preparation and feature building stages with reusable classes for each data source, making subsequent feature generation a matter of a few lines of code.

avatar for Vitaly Gordon

Vitaly Gordon

CEO, Faros AI

Monday May 16, 2016 2:10pm - 2:50pm PDT