Data By the Bay has ended
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
Back To Schedule
Wednesday, May 18 • 11:40am - 12:20pm
From text to knowledge via ML algorithms - the Quora answer

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Q&A sites like Quora aim at growing the world’s knowledge. In order to do this, they need not only to get the right questions to the right people so they can answer them, but also the existing answers to people who are interested in them. In order to accomplish this we need to build a complex ecosystem taking text as the main data source, but also taking into account issues such as content quality, engagement, demand, interests, or reputation. Using high-quality data you can build machine learning solutions that can help address all of those requirements. In this talk I will describe some interesting uses of machine learning that range from different recommendation approaches such as personalized ranking to classifiers built to detect duplicate questions or spam. I will describe some of the modeling and feature engineering approaches that go into building these systems. I will also share some of the challenges faced when building such a large-scale knowledge base of human-generated knowledge. I will use my experience at Quora as the main driving example. Quora is a Q&A site that despite having over 80 million unique visitors a month, it is known for keeping a high-quality of knowledge and content in general.

avatar for Xavier Amatriain

Xavier Amatriain

VP Engineering, Quora
VP of Engineering, Quora

Wednesday May 18, 2016 11:40am - 12:20pm PDT