Data By the Bay has ended
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
Back To Schedule
Wednesday, May 18 • 2:10pm - 2:30pm
Building a Graph of all US businesses using Spark technologies

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Radius Intelligence (www.radius.com) empowers Data Science to deliver an unique marketing intelligence platform used by over hundred US companies. This presentation will explain how Radius is using Spark along with GraphX, MLLib and Scala to create a comprehensive and accurate index of US business from dozens of different sources. In particular, I will address problems related to clustering records together based on a graph approach and how to resolve the graph into a set of US businesses. I will discuss some of the models related to cleaning out the noise and how to rank best values and impute missing values and provide some best practices.

avatar for Alexis Roos

Alexis Roos

Engineering manager, Radius Intelligence
Alexis has over 20 years of software engineering experience with emphasis in large scale data science and engineering and application infrastructure. Currently an Engineering Manager at Radius Intelligence, Alexis is leading a team of data scientists and data engineers building Radius... Read More →

Wednesday May 18, 2016 2:10pm - 2:30pm PDT