Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas spanned by multiple horizontal data pipelines, platforms, and algorithms. We are unifying data science and data engineering, showing what really works to run businesses at scale.
SAMI needs to ingest, serve and take realtime decisions at a large scale. In this talk we show briefly * How to build scalable queries using Cassandra, Redis and Elastic search. * High performance batch jobs using Apache parquet columnar storage format. * Trade offs between idempotent writes and Streaming counters real time.