Data By the Bay has ended
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
Back To Schedule
Tuesday, May 17 • 9:50am - 10:30am
The practice of acquiring good labels

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Engineers and researchers use human computation as a mechanism to produce labeled data sets for product development, research and experimentation. In a data-driven world, good labels are key. To gather useful results, a successful labeling task relies on many different elements: from clear instructions and user interface design to algorithms for quality control. In this talk, I will present a perspective for collecting high quality labels with an emphasis on practical implementations and scalability. I will focus on three main topics: programming crowds, debugging tasks with low agreement, and algorithms for quality control. I plan to show many examples and code along the way.

avatar for Omar Alonso

Omar Alonso

Tech Lead, Instacart
Omar is a Tech Lead at Instacart where he works on the intersection of information retrieval, knowledge graphs, and human computation. 

Tuesday May 17, 2016 9:50am - 10:30am PDT