Name: Interactive Machine Learning on Genomics Data with the Spark Notebook
Start: 2016-05-20T11:40:00-0700
End: 2016-05-20T12:00:00-0700

Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas spanned by multiple horizontal data pipelines, platforms, and algorithms. We are unifying data science and data engineering, showing what really works to run businesses at scale.

Back To Schedule

Interactive Machine Learning on Genomics Data with the Spark Notebook

Processing genomics data efficiently nowadays implies being able to work at scale, to use advanced Machine Learning methods and to develop models in an interactive manner. The required convergence of technologies is a reality and is presented here. The edifice builds from ADAM, a spark library for genomics developped at the Amplab, providing the right data representation and APIs for applying distributed computing on genomics data. The development tool is the spark Notebook, giving an interactive interface to request code execution. Its integration with scalable Machine Learning libraries and ADAM allows us to work interactively on data from a single environment, at scale , with advanced modelling methods. We demonstrate some examples of genomics data processing, i.e. on 1000genomes data, going from simple data manipulation to descriptive statistics and more complex population stratification with Deep learning.

Speakers

Andy Petrella

Cofounder, Data Fellas

Creator of Spark Notebook

Friday May 20, 2016 11:40am - 12:00pm PDT
Ada

Life

Data By the Bay

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Andy Petrella

Attendees (5)