Loading…
Data By the Bay has ended
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
avatar for Xavier Tordoir

Xavier Tordoir

Data Fellas, Inc.
Founder

Xavier started his career as a researcher in Experimental Physics and also focused on data processing. Further down the road, he took part in projects in finance, genomics and software development for academic research. During that time, he worked on timeseries, on prediction of biological molecular structures and interactions, and applied Machine Learning methodologies. He developed solutions to manage and process data distributed across data centres.


He now founded and works at Data Fellas, a company dedicated to distributed computing and advanced analytics, leveraging Scala, Spark and other distributed technologies like H2O for machine learning.

Abstract: 
Sparkling Water on the Spark Notebook: Interactive Genomes clustering
It’s a matter of fact that H2O provides advanced Machine Learning capabilities scaling with large datasets. Also, interoperating between H2O and generic large scale data manipulation frameworks like Apache Spark is of utmost importance to help Data Scientists bring the most efficiency on the table, this is where Sparkling Water is shining. The last stone of the edifice is then to  work interactively on data from a single environment, allowing the data scientist to share his results and code. We present here the Spark Notebook working with Sparkling Water to bring the valuable H2O libraries to the Spark environment. We show a case of genomics data processing, leveraging Spark and its genomics library ADAM to efficiently access raw data with domain specific objects, data preparation is done with spark and deep learning from H2O is used to compute a model for population stratification within the set of genomes under investigation.