Loading…
Data By the Bay has ended
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
Friday, May 20 • 11:40am - 12:00pm
Interactive Machine Learning on Genomics Data with the Spark Notebook

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Processing genomics data efficiently nowadays implies being able to work at scale, to use advanced Machine Learning methods and to develop models in an interactive manner. The required convergence of technologies is a reality and is presented here. The edifice builds from ADAM, a spark library for genomics developped at the Amplab, providing the right data representation and APIs for applying distributed computing on genomics data. The development tool is the spark Notebook, giving an interactive interface to request code execution. Its integration with scalable Machine Learning libraries and ADAM allows us to work interactively on data from a single environment, at scale , with advanced modelling methods. We demonstrate some examples of genomics data processing, i.e. on 1000genomes data, going from simple data manipulation to descriptive statistics and more complex population stratification with Deep learning.

Speakers
avatar for Andy Petrella

Andy Petrella

Cofounder, Data Fellas
Creator of Spark Notebook


Friday May 20, 2016 11:40am - 12:00pm PDT
Ada

Attendees (5)