Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas spanned by multiple horizontal data pipelines, platforms, and algorithms. We are unifying data science and data engineering, showing what really works to run businesses at scale.
Pandora began with The Music Genome Project, the most sophisticated taxonomy of musicological data ever collected and an extremely effective content-based approach to music recommendation. Its foundation is based on human music cognition, and how an expert describes and perceives the complex world of a music piece.
But what happens when you have a decade of additional data points, given off by more than 250 million registered users who have created 8+ billion personalized radio stations and given 60+ billion thumbs? As opposed to other traditional recommender systems, such as Netflix or Amazon, which need to recommend a single item or static set, Pandora provides an evolving set of sequential items, and needs to react in just a few milliseconds when the user is unhappy with the proposed songs. Furthermore, a variety of factors (e.g., musicological, social, geographical, or generational) play a critical role in deciding what music to play to a user, and these factors vary dramatically across each individual listener.
Furthermore, in this talk I will present a dynamic ensemble learning system that combines curational data and machine learning models to provide a truly personalized experience. This approach allows us to switch from a lean back experience (exploitation) to a more exploration mode to discover new music tailored specifically to users individual tastes. I will also discuss how Pandora, a data-driven company, makes informed decisions about the features that are added to the core product based on the results of extensive online A/B testing.
Following this session the audience will have an in-depth understanding of how Pandora uses Big Data Science to determine the perfect balance of familiarity, discovery, repetition and relevance for each individual listener, measures and evaluates user satisfaction, and how our online and offline architecture stack plays a critical role in our success.