Data By the Bay has ended
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

IoT [clear filter]
Tuesday, May 17


Time series analytics for Big Data and IoT with Kx
Trying to solve the data riddle purely through the lens of architecture is missing a vital point: The unifying factor across all data is a dependency on time. The ability to capture and factor in time is the key to unlocking real cost efficiencies. Whether it’s streaming sensor data, financial market data, chat logs, emails, SMS or the P&L, each piece of data exists and changes in real time, earlier today or further in the past. Unless they are linked together in a way that firms can analyze, there is no way of providing a meaningful overview of the business at any point in time. This talk will demonstrate using live coding how kdb+, a columnar relational time-series database, with a tightly integrated query language called q, can do aggregations and consolidations on billions of streaming, real-time and historical records for complex analytics.

avatar for Fintan Quill

Fintan Quill

Global Head of Sales Engineering, Kx Systems
Fintan Quill is the global head of sales engineering for Kx Systems. An expert in developing database analytic systems, Fintan joined Kx in 2012 after having worked extensively with quantitative teams at a variety of Wall Street investment banks, hedge funds, and trading shops building... Read More →

Tuesday May 17, 2016 4:00pm - 4:20pm
Thursday, May 19


Beyond 1M Acres of Drone Imagery
DroneDeploy is making the sky productive and accessible to everyone. Our software platform flies unmanned aircraft all over the world and processes the data they collect. We'll show some of the image processing techniques we use to solve key problems in Agriculture, Mining and Construction as well as give a look at what future problems we are tacking in the drone space.

avatar for Nicholas Pilkington

Nicholas Pilkington

CTO, DroneDeploy

Thursday May 19, 2016 9:50am - 10:30am


Finding Your Audience In The Internet of Things
Understanding the nature of the expressive and diverse audiences of applications can be transformative in the creation of powerful data products. For many industries, user interaction is the most accurate signal in audience segmentation, but the devices in the Internet of Things are often quiet and natural audience segments can be lost. In analyzing the Automatic driving data we have learned that there are many kinds of cars, but drivers themselves often act very similarly. In this study we correlate the confluence of driving style and physical models of cars with the aim to segment our audience using unsupervised learning techniques such as NMF.

avatar for Dhruv Choudhary

Dhruv Choudhary

Data Scientist, Automatic Labs
Dhruv is a Data Scientist at Automatic Labs where he is currently building data products for the connected car space. Automatic brings to life a wealth of data about driving behavior of users and their interactions with their cars. | He also works on building scalable infrastructure... Read More →

Thursday May 19, 2016 11:10am - 11:30am


Know the air you are breathing
This talk will demonstrate how to use a publicly available dataset of air quality sensor readings, clean and query the data, and visualize the data enabling the government and public to take appropriate actions. I will use a publicly available dataset from the epa.gov and the The U.S. Department of State Mission China air quality monitoring program. The set consists of data from devices that are sending measurements for San Franciso and Beijing. The air quality data measurements is enriched with extra data in the form of weather data from weather.gov to give the data additional context. We will then visualize the enriched data and see how the data relates to Air Quality. The technology stack leveraged comprises of Mesos, Zookeeper, Marathon, Docker, Riak TS, Kafka, Spark, Zeppelin. 

avatar for Seema Jethani

Seema Jethani

Director of Product Management, Basho Technologies
Hello! I currently lead Product Management at Basho Technologies for Basho's flagship products Riak KV and Riak TS, distributed NoSQL databases.Prior to joining Basho, I held Product Management and Strategy positions at Dell, Enstratius and IBM. I hold an MBA degree from Duke University’s... Read More →

Thursday May 19, 2016 11:10am - 11:30am


MacroBase: Analytic Monitoring for the Internet of Things
An increasing proportion of data today is generated by automated processes, sensors, and systems---collectively, the Internet of Things (IoT). A core challenge in IoT and an increasingly popular value proposition of many IoT applications in domains including industrial diagnostics, predictive maintenance, and urban observability is in identifying and highlighting unusual and surprising data (e.g., poor driving behavior, equipment failures, gunshots). We call this task---which is often statistical in nature and time-sensitive---analytic monitoring. To facilitate rapid development and scalable deployment of analytic monitoring queries, we have developed MacroBase, a new kind of data analytics engine that provides turn-key analytic monitoring of IoT data streams. MacroBase implements a customizable pipeline of outlier detection, summarization, and ranking operators. To facilitate efficient and accurate operation, MacroBase implements several cross-layer optimizations across robust estimation, pattern mining, and sketching procedures. As a result, MacroBase can analyze several million events per second on a single server. MacroBase has already uncovered several unexpected behaviors (and corresponding bugs) in production in a medium-scale IoT deployment.

avatar for Peter Bailis

Peter Bailis

Professor, Stanford University
Peter Bailis is an assistant professor of computer science at Stanford University. Peter's research in the Future Data Systems group (http://futuredata.stanford.edu/) focuses on the design and implementation of next-generation data-intensive systems. His work spans large-scale data... Read More →

Thursday May 19, 2016 11:10am - 11:30am


Building realtime efficient queries and batch jobs in an IOT platform
SAMI needs to ingest, serve and take realtime decisions at a large scale. In this talk we show briefly * How to build scalable queries using Cassandra, Redis and Elastic search. * High performance batch jobs using Apache parquet columnar storage format. * Trade offs between idempotent writes and Streaming counters real time.

avatar for Dinesh Narayanan

Dinesh Narayanan

Staff Engineer, Samsung SSIC
Dinesh Narayanan is a Staff Engineer at Samsung SSIC. He is passionate in building Low latency distributed applications and functional programming.

Thursday May 19, 2016 1:10pm - 1:30pm


IoT security using data analysis
The interconnected world is here upon us, and so are the hackers! Security risks are multiplying with even the most basic sensor devices now becoming network-aware. Traditional security solutions often fail to protect such diverse Internet-of-Things (IoT) infrastructure. In this talk, we will present our innovative solution of analyzing large amount of IoT data and using behavior analytics in the cloud to detect anomalies. The patent pending solution from ZingBox fingerprints the behavior of IoT devices and generates a detailed ‘behavior profile’ via machine learning. ZingBox's solution is designed to protect IoT devices without any footprint on the IoT devices.

avatar for Dr. May Wang

Dr. May Wang

Co-founder & CTO, ZingBox
Dr. May Wang is Co-founder and CTO of ZingBox, an Internet of Things (IoT) security company in Silicon Valley, well funded by CEOs of leading public security companies, partners of top VC PE firms, and Stanford professors. May is also a Venture Partner of SAIF (a $4B PE firm), an... Read More →

Thursday May 19, 2016 1:10pm - 1:30pm


Detecting Anomalies in Streaming Data – Real-time Algorithms for Real-world Applications
There’s no question that we are seeing an increase in the availability of streaming, time-series data. Largely driven by the rise of the Internet of Things (IoT) and connected real-time data sources, we now have an enormous number of applications with sensors that produce important data that changes over time. This data presents a challenge and opportunity for businesses across every industry. How do they handle the onslaught of streaming data? How can they exploit it to make decisions in real-time? One way is to detect, in real time, when something unusual occurs. Early anomaly detection in streaming data has significant implications, yet can be very difficult to execute. It requires detectors to process data in real-time, not batches, and learn while simultaneously making predictions. In this talk, we’ll look at algorithms designed for such data and analyze the components that lead to optimal performance. We’ll also discuss a new benchmark with a labeled, real-world data set, designed to provide a controlled and repeatable environment of open-source tools to test and measure anomaly detection algorithms on streaming data. How do we score in a way that rewards algorithms that detect all anomalies as soon as possible, triggers no false alarms, works with real-world time-series data across a variety of domains, and automatically adapts to changing statistics?

avatar for Subutai Ahmad

Subutai Ahmad

VP Research, Numenta
Numenta has a broad, long-term research agenda: we want to advance our understanding of cortical principles, and build systems based on those principles. We are currently focusing our research on new application areas, developing a theory of neurons, sequence memory in cortex, sensorimotor... Read More →

Thursday May 19, 2016 2:10pm - 2:50pm


Automating Data Science for the IOT
The big challenge with machine data is the weak signal to noise ratio. The patterns are too tiny and too many spread over a large amount of data. Also patterns change frequently. All these variability makes it extremely challenging to dig insights/signals out by the traditional approach of generating models manually. This requires applying machine automation and machine intelligence to even finding the patterns on a continuous basis. In this talk we will demonstrate how we have applied this to solve problems of predictive maintenance for Fortune 500s.

avatar for Ruban Phukan

Ruban Phukan

Chief Product and Analytics Officer, DataRPM
Ruban is a serial entrepreneur and technologist with rich and diverse experience in data science, product, technology and business. As a data scientist in Yahoo, Ruban’s role involved data mining and analyzing several big data sets of Yahoo and coming up with strategic business... Read More →

Thursday May 19, 2016 3:00pm - 3:20pm