Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas spanned by multiple horizontal data pipelines, platforms, and algorithms. We are unifying data science and data engineering, showing what really works to run businesses at scale.
Engineers and researchers use human computation as a mechanism to produce labeled data sets for product development, research and experimentation. In a data-driven world, good labels are key. To gather useful results, a successful labeling task relies on many different elements: from clear instructions and user interface design to algorithms for quality control. In this talk, I will present a perspective for collecting high quality labels with an emphasis on practical implementations and scalability. I will focus on three main topics: programming crowds, debugging tasks with low agreement, and algorithms for quality control. I plan to show many examples and code along the way.