Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
View analytic
Wednesday, May 18 • 2:10pm - 2:30pm
Building a Graph of all US businesses using Spark technologies

Sign up or log in to save this to your schedule and see who's attending!

Radius Intelligence (www.radius.com) empowers Data Science to deliver an unique marketing intelligence platform used by over hundred US companies. This presentation will explain how Radius is using Spark along with GraphX, MLLib and Scala to create a comprehensive and accurate index of US business from dozens of different sources. In particular, I will address problems related to clustering records together based on a graph approach and how to resolve the graph into a set of US businesses. I will discuss some of the models related to cleaning out the noise and how to rank best values and impute missing values and provide some best practices.

Speakers
avatar for Alexis Roos

Alexis Roos

Engineering manager, Radius Intelligence
Alexis has over 20 years of software engineering experience with emphasis in large scale data science and engineering and application infrastructure. | Currently an Engineering Manager at Radius Intelligence, Alexis is leading a team of data scientists and data engineers building Radius business graph modeling over 20 million businesses in the US, created from over 7 billion records from dozens of sources using Spark, GraphX, MLLib and Scala... Read More →



Wednesday May 18, 2016 2:10pm - 2:30pm
Ada

Attendees (25)