This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
View analytic
Monday, May 16 • 11:40am - 12:20pm
Characterizing and measuring the performance of Big Data processing platforms

Sign up or log in to save this to your schedule and see who's attending!

There are several Big Data platforms, architectures and frameworks already out there and more are coming out each day, figuratively speaking! In such an ecosystem it is difficult to truly measure or characterize the performance of a data processing infrastructure using these frameworks. We abstract out the frameworks into three categories - Batch, Query and Streaming. In this paper, we identify characteristics for each kind of framework and present the results of running heterogeneous workloads for batch frameworks such as Hadoop, stream frameworks such as Spark and query frameworks such as Impala on target cloud-based infrastructure. In our experiments, we have seen performance variations given the multi-tenant nature of the infrastructure and have accounted for these temporal conditions by running our experiments at different times.

avatar for Manish Singh

Manish Singh

CTO, Co-founder, MityLytics
Manish is first and foremost a systems professional who loves to get his hands dirty, deploying, maintaining and tuning massively parallel and distributed systems. At MityLytics he and his partners help customers make the transition to Big Data -painlessly. Once deployed the team at MityLytics helps customers to scale and tune their deployments using MityLytics software. | Manish has over 18 years of product development and... Read More →

Monday May 16, 2016 11:40am - 12:20pm

Attendees (12)