Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas spanned by multiple horizontal data pipelines, platforms, and algorithms. We are unifying data science and data engineering, showing what really works to run businesses at scale.
3Scan is an automated histology company. Our core technical offering is a 3d imaging microscope, which generate volumetric image data. Each microscope can create several terabyte scale datasets per day, and we run 3 in production, and will have close to 10 by the end of the year. In this talk I will address some of the challenges and issues surrounding data collection and analysis at this scale. Relevant tools include, Python, Apache Spark, EC2, and Meteor.