Data By the Bay has ended
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
Monday, May 16 • 4:00pm - 4:40pm
Parallel and distributed big joins in H2O

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Matt has taken the radix join as implemented in R's data.table and parallelized and distributed it in H2O. He will describe how the algorithm works, provide benchmarks and highlight advantages/disadvantages. H2O is open source on GitHub and is accessible from R and Python using the h2o package on CRAN and PyPI.

avatar for Matt Dowle

Matt Dowle

Hacker, H2O.ai
Matt is the main author of R's data.table package, the 2nd most asked about R package on Stack Overflow. He has worked for some of the world’s largest financial organizations: Lehman Brothers, Salomon Brothers, Citigroup, Concordia Advisors and Winton Capital. He is particularly... Read More →

Monday May 16, 2016 4:00pm - 4:40pm PDT