Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
View analytic
Monday, May 16 • 4:00pm - 4:40pm
Parallel and distributed big joins in H2O

Sign up or log in to save this to your schedule and see who's attending!

Matt has taken the radix join as implemented in R's data.table and parallelized and distributed it in H2O. He will describe how the algorithm works, provide benchmarks and highlight advantages/disadvantages. H2O is open source on GitHub and is accessible from R and Python using the h2o package on CRAN and PyPI.

Speakers
avatar for Matt Dowle

Matt Dowle

Hacker, H2O.ai
Matt is the main author of R's data.table package, the 2nd most asked about R package on Stack Overflow. He has worked for some of the world’s largest financial organizations: Lehman Brothers, Salomon Brothers, Citigroup, Concordia Advisors and Winton Capital. He is particularly pleased that data.table is also used outside Finance, for example Genomics where large and ordered datasets are also researched.


Monday May 16, 2016 4:00pm - 4:40pm
Markov

Attendees (11)