Data By the Bay has ended
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
Friday, May 20 • 3:00pm - 3:20pm
Distributed Visualization for Genomic Analysis

Sign up or log in to save this to your schedule and see who's attending!

Current genomics visualization tools are intended for a single node environment and lack computational resources to provide interactive speeds. Data from the 1000 Genomes Project provides 1.6 terabytes of variant data and over 14 terabytes of alignment data. However, typical genomic visualizations materialize less than 10 kbp, approximately 3.3e­7% of the genome. Mango is a visualization browser that selectively materializes and organizes genomic data to provide fast in memory queries. Mango materializes data from persistent storage as the user requests different regions of the genome. This data is efficiently partitioned and organized in memory using interval trees, which enables quick range queries over genomic data.

avatar for Alyssa Morrow

Alyssa Morrow

Student Researcher, University of California-Berkeley
At UC Berkeley, the BDGenomics team is working to create scalable genomics preprocessing and analysis on top of Spark. I am currently working on a distributed genomic visualization tool that allows ad hoc querying on TB of genomic data.
avatar for Eric Tu

Eric Tu

Graduate Student, UC Berkeley AMPLab
I'm a graduate student at UC Berkeley in the AMPLab, working on genomic visualizations built on top of Spark.

Friday May 20, 2016 3:00pm - 3:20pm

Attendees (8)