Name: Real-time, Streaming Advanced Analytics, Approximations, and Recommendations using Apache Spark ML/GraphX, Kafka Stanford CoreNLP, and Twitter Algebird BONUS: Netflix Recommendations: Then and Now
Start: 2016-05-16T14:10:00-0700
End: 2016-05-16T14:50:00-0700

Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas spanned by multiple horizontal data pipelines, platforms, and algorithms. We are unifying data science and data engineering, showing what really works to run businesses at scale.

Back To Schedule

Real-time, Streaming Advanced Analytics, Approximations, and Recommendations using Apache Spark ML/GraphX, Kafka Stanford CoreNLP, and Twitter Algebird BONUS: Netflix Recommendations: Then and Now

Agenda Intro Live, Interactive Recommendations Demo Spark ML, GraphX, Streaming, Kafka, Cassandra, Docker Types of Similarity Euclidean vs. Non-Euclidean Similarity User-to-User Similarity Content-based, Item-to-Item Similarity (Amazon) Collaborative-based, User-to-Item Similarity (Netflix) Graph-based, Item-to-Item Similarity Pathway (Spotify) Similarity Approximations at Scale Twitter Algebird MinHash and Bucketing Locality Sensitive Hashing (LSH) BONUS: Netflix Recommendation Algorithms: From Ratings to Real-Time DVD-Ratings-based $1M Netflix Prize (2009) Streaming-based "Trending Now" (2016) Wrap Up Q & A

Speakers

Chris Fregly

Solution Architect, AI and machine learning, AWS

Monday May 16, 2016 2:10pm - 2:50pm PDT
Markov

Pipelines

Data By the Bay

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Chris Fregly

Attendees (24)