Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas spanned by multiple horizontal data pipelines, platforms, and algorithms. We are unifying data science and data engineering, showing what really works to run businesses at scale.
The flexibility and simplicity of JSON have made it one of the most common formats for data. Data engines need to be able to load, process, and query JSON and nested data types quickly and efficiently. There are multiple approaches to processing JSON data, each with trade offs. In this session we’ll discuss the reasons and ways that developers want to use flexible schema options and the challenges that creates for processing and querying that data. We’ll dive into the approaches taken by different technologies such as Hive, Drill, BigQuery, Spark, and others, and the performance and complexity trade offs of each. The attendee with leave with an understanding of how to assess which system is best for their use case.