Data By the Bay has ended
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Democracy [clear filter]
Wednesday, May 18


Let's Build Civic Tech to Scale Democracy for a Better Future
I'm planning to outline some of the major challenges facing our political system (especially as they pertain to citizen engagement, empowerment, and faith in the process), suggest some ways that civic applications (largely made possible by open data) are beginning to address these challenges (somewhat of a quick survey of the field), and then close with a call to action.

avatar for Matt Mahan

Matt Mahan

CEO, Brigade
Matt Mahan is CEO and cofounder of Brigade. He previously served asCEO of Causes, the world’s largest online campaigning platform. Mattgrew up in Watsonville, CA, is a Teach for America alum and formerHarvard student body president.

Wednesday May 18, 2016 9:00am - 9:40am


Design for Local Government: Data, Services, and Transparency
This talk by Steve Pepple & Morgan Keys will cover how the design team at OpenGov helps local governments better use their data, collaborate on public services, and communicate with citizens.

We'll discuss how we work to understand our users and design products for a group of people who have been under served for too long. Through user research we’ve found smart and dedicated public servants doing astonishing analysis without the latest technology, namely spreadsheets.

We'll share how we explore and solve the same problems with data science, data visualization, and interaction design.

Designing more efficient and intuitive processes for local governments, reduces data silos and error-prone administrative work. It gives local leaders time to be strategic and better communicate their decisions to the public. 

avatar for Steve Pepple

Steve Pepple

Product Designer and Developer, OpenGov
Steve Pepple is a Bay Area designer and software developer who works to improve city streets and civic information systems. He a product designer at OpenGov, where he designs software that improves how governments spend money, make decisions, and communicate with citizens. His... Read More →

Wednesday May 18, 2016 11:10am - 11:30am


Unsupervised NLP Classification with Clustering
In the world of local government finances training data is sparse. Language based training data is almost non-existent. Furthermore, fiscal language in governments has a high domain knowledge requirement to build training data and garner strong intuitions. This makes traditional supervised methods difficult to use successfully, as the training data you generate is always lagging raw data growth. To help tackle these challenges in performing NLP analysis we'll be showing techniques around relationship extraction and clustering to perform data understanding on domain heavy topics. We'll be exploring these techniques on published local government budget pdfs to extract topics and gain insights into the purpose of domain specific text. The format of the talk will follow each key point with code examples. First we’ll talk about data challenges in local government, and the lack of established knowledge bases around that data. Specifically we’ll explore the unknown number of classes problem and how unsupervised algorithms can garner insights. Then we’ll focus on the families of clustering algorithms available and how they allow you to focus on edge associations rather than holistic state spaces. Following that we’ll explore some useful techniques for optimizing computation and how missing or skipped data points can be linked by association. Finally we’ll combine the pieces we’ve shown to perform topic extraction and understanding from public financial budgets.

avatar for Matthew Seal

Matthew Seal

Data Scientist, OpenGov
I'm an early employee of OpenGov who has a passion for data models, and data understanding. I've had a broad exposure to software development of various types from front-end code, to db architecture, to machine learning. I graduated from Stanford University with a BS in Electrical... Read More →

Wednesday May 18, 2016 11:40am - 12:20pm


Open Collaboration for Civic Impact
Americans consistently rank government dissatisfaction as the “most important problem facing the U.S.” It has become increasingly clear that a host of different actors–public and private, formal and informal, citizens and communities–will need to work together to successfully address the challenges we face as a society. While civic innovation is a bright spot of hope, the sector is still nascent. Key to catalyzing this new wave of civic solutions will be knowledge sharing and storytelling around success and impact, which is not happening in any standardized, systemic, aggregated way across the sector. We believe that cataloging success (and failure) via an open collaboration project lifecycle will lead to new waves inspired action directed towards more effective ends.

avatar for Lawrence Grodeska

Lawrence Grodeska

Co-founder & CEO, CivicMakers
Lawrence Grodeska is a maker, communicator and civic geek who uses technology to help civic leaders in the public and private sector engage key audiences. Over 15 years, Lawrence has built programs and products to transform how citizens interact with their communities and governments... Read More →

Wednesday May 18, 2016 1:10pm - 1:30pm


Making Predictions Under Lending Regulations
One particular difficulty when working in lending is being subject to a variety of different regulatory requirements. This has many implications when working with data - the requirements often affect the types of models you can build or restrict which predictors you can include when modelling. In this talk we present how the Fair Credit Reporting Act (FCRA) impacted our loan application accept / reject modelling. In particular, the FCRA requires explicit reasons for a declined application, which we satisfied through a thoughtful use of ensemble methods.

avatar for Yujay Huoh

Yujay Huoh

Data Scientist, Earnest, Inc.
Data Scientist at Earnest. Before Earnest, I worked for a private bank in San Francisco doing credit modelling and enterprise risk. Statistics Ph.D from UC Berkeley. Interested in spatial statistics, numerical programming, and all things Bayesian.

Wednesday May 18, 2016 1:40pm - 2:00pm


Breaking Down Paywalls for Online Health
Approximately one-quarter of people searching for health information online hit a paywall. Medical knowledge is locked up in non-open access scientific research papers which have copyright licenses that prevent free distribution. However, facts cannot be copyrighted* and may pass through paywalls unencumbered by copyright license restrictions. We have developed a framework to enable access to scientific knowledge. Academic readers with access to papers can locally install and run our freely available Fact Extractor software. After a local PDF paper is identified and approved by the user, Fact Extractor identifies and extracts facts from the scientific paper. The software then distributes the extracted facts to our public Wiki-based server http://factpub.org for everyone to access. Client-side processing for fact extraction means no copies of the paper are distributed. Large-scale adoption of this fact-publishing framework will empower accessibility to health and other scientific research. * Feist Publications, Inc., v. Rural Telephone Ser-vice Co., 499 U.S. 340 (1991)

avatar for Pauline Ng

Pauline Ng

Group Leader, Genome Institute of Singapore

Wednesday May 18, 2016 2:10pm - 2:50pm


Gathering around the data table
Open data is not just an end but a means to broadening participation in our ongoing acts of self-government. While certainly important, voting in elections is just one act we take in our collective pursuit toward a more perfect union. Data can have a broader impact on the individual acts we take day to day. It can even redefine what it means to participate in a modern democracy. At DataSF, we seek to empower the use of data from the City and County of San Francisco across a spectrum of uses. I’ll discuss ways we are encouraging people to “gather around the data table” to collectively understand and address some of our most pressing challenges. And I’ll highlight projects and initiatives that demonstrate this, including:
  • How we’re putting our data users and data publishers at the center of our program to make sure we’re driving the program around actual needs
  • How we participate in our local San Francisco civic hacking group nearly every week
  • How we raised a digital barn called the Housing Data Hub to provide more context and insights on housing data in San Francisco
  • How we’re beginning to work with existing community institutions to make use of the City’s many open data assets

avatar for Jason Lally

Jason Lally

Open Data Program Manager, Mayor's Office, San Francisco
Jason Lally is the Open Data Program Manager, working with the City’s Chief Data Officer, Joy Bonaguro, to help operationalize the City’s data strategy. Jason comes to the City by way of a Mayor’s Innovation Fellowship that wrapped up in August 2014. Before that, he worked at... Read More →

Wednesday May 18, 2016 3:00pm - 3:20pm


Quantifying Democracy and Freedom with Human Rights and Fertility Metrics
Having revisited the known phenomenon of negative correlation between girls' education and fertility analyzed by Jeffrey Sachs in "The End of Poverty", we proceed to take into account such quantitative ratings of democracy and freedom as the Democracy Index compiled by the Economist Intelligence Unit and Freedom in the World scores published annually by Freedom House. Are all democracies doomed to act as behavioral sinks, in John B. Calhoun's terminology? We will approach the problem by looking at the structure of fundamental and derived human rights. The structure will incorporate the right to have children and the right not to have children, neither of which the founding fathers needed to address directly, given a very different historical situation. The former right is explicitly restricted in China nowadays, while the latter is under attacks elsewhere. Preliminary conclusions will be drawn and future research directions proposed.

avatar for Dmitri Gusev

Dmitri Gusev

Associate Professor, Purdue University
Dmitri A. Gusev is an Associate Professor of Computer and Information Technology (CIT). His primary research interests include imaging, game development, visualization, and computational linguistics. Dmitri A. Gusev received his Ph.D. in Computer Science from Indiana University in... Read More →

Wednesday May 18, 2016 3:20pm - 3:40pm