This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
View analytic
Monday, May 16 • 2:10pm - 2:50pm
The Lego Model for Machine Learning

Sign up or log in to save this to your schedule and see who's attending!

80-90% of data science is data cleaning and feature engineering. However, if we were to plot a count of what all the data science tools are for, we would find that most innovation happens in data infrastructure and modeling. We want to change that and make data scientists much more productive while also improving the quality of their work. In this talk I will describe the machine learning platform we wrote on top of spark to modularize these steps. This allows easy reuse of components, simplifying model building and changes. The framework simplifies the data preparation and feature building stages with reusable classes for each data source, making subsequent feature generation a matter of a few lines of code.

avatar for Vitaly Gordon

Vitaly Gordon

VP, Data Science and Engineering, Salesforce Einstein
VP, Data Science and Data Engineering, Salesforce Einstein

Monday May 16, 2016 2:10pm - 2:50pm

Attendees (29)