Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
Tuesday, May 17 • 9:05am - 9:40am
Data and Algorithmic Bias in the Web

The Web is the largest public big data repository that humankind has created. In this overwhelming data ocean, we need to be aware of the quality and, in particular, of the biases that exist in this data. In the Web, biases also come from redundancy and spam, as well as from algorithms that we design to improve the user experience. This problem is further exacerbated by biases that are added by these algorithms, specially in the context of search and recommendation systems. They include selection and presentation bias in many forms, interaction bias, etc. We give several examples and their relation to sparsity, novelty, and privacy, stressing the importance of the user context to avoid these biases.

avatar for Ricardo Baeza-Yates

Ricardo Baeza-Yates

Ricardo Baeza-Yates areas of expertise are information retrieval, web search and data mining, as well as data science and algorithms in general. He was VP of Research at Yahoo Labs, based in Sunnyvale, California, from August 2014 to March 2016. Before he founded and lead from 2006 to 2015 the Yahoo labs in Barcelona and Santiago de Chile. Between 2008 and 2012 he also oversaw the Haifa lab, as well as started the London lab in 2012. He is... Read More →

