Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas spanned by multiple horizontal data pipelines, platforms, and algorithms. We are unifying data science and data engineering, showing what really works to run businesses at scale.
Q&A sites like Quora aim at growing the world’s knowledge. In order to do this, they need not only to get the right questions to the right people so they can answer them, but also the existing answers to people who are interested in them. In order to accomplish this we need to build a complex ecosystem taking text as the main data source, but also taking into account issues such as content quality, engagement, demand, interests, or reputation. Using high-quality data you can build machine learning solutions that can help address all of those requirements. In this talk I will describe some interesting uses of machine learning that range from different recommendation approaches such as personalized ranking to classifiers built to detect duplicate questions or spam. I will describe some of the modeling and feature engineering approaches that go into building these systems. I will also share some of the challenges faced when building such a large-scale knowledge base of human-generated knowledge. I will use my experience at Quora as the main driving example. Quora is a Q&A site that despite having over 80 million unique visitors a month, it is known for keeping a high-quality of knowledge and content in general.