Data By the Bay has ended
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
Monday, May 16 • 2:10pm - 2:50pm
Real-time, Streaming Advanced Analytics, Approximations, and Recommendations using Apache Spark ML/GraphX, Kafka Stanford CoreNLP, and Twitter Algebird BONUS: Netflix Recommendations: Then and Now

Sign up or log in to save this to your schedule and see who's attending!

Agenda Intro Live, Interactive Recommendations Demo Spark ML, GraphX, Streaming, Kafka, Cassandra, Docker Types of Similarity Euclidean vs. Non-Euclidean Similarity User-to-User Similarity Content-based, Item-to-Item Similarity (Amazon) Collaborative-based, User-to-Item Similarity (Netflix) Graph-based, Item-to-Item Similarity Pathway (Spotify) Similarity Approximations at Scale Twitter Algebird MinHash and Bucketing Locality Sensitive Hashing (LSH) BONUS: Netflix Recommendation Algorithms: From Ratings to Real-Time DVD-Ratings-based $1M Netflix Prize (2009) Streaming-based "Trending Now" (2016) Wrap Up Q & A

avatar for Chris Fregly

Chris Fregly

Founder, PipelineAI
Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup... Read More →

Monday May 16, 2016 2:10pm - 2:50pm

Attendees (24)