Data By the Bay: Full Schedule

Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas spanned by multiple horizontal data pipelines, platforms, and algorithms. We are unifying data science and data engineering, showing what really works to run businesses at scale.

9:50am PDT

Hot Legal Issues

Overview of recent legal developments affecting Big Data, including the recent Federal Trade Commission Report on Big Data

Speakers

Francoise Gilbert

Partner, Greenberg Traurig Attorneys at Law

Françoise Gilbert is a partner at Greenberg Traurig, and practices in the firm’s Silicon Valley office, located in East Palo Alto, California, where she advises public companies, emerging technology businesses and non-profit organizations, on the entire spectrum of domestic and... Read More →

Wednesday May 18, 2016 9:50am - 10:30am PDT
Gardner

Law

10:10am PDT

Data From Click To Settlement: A Full-Funnel UX Approach

For young law firms in particular, the challenge of finding and retaining new clients is won by responding to inbound calls and intakes more quickly than the competition. However, an effective lawyer cannot spend his/her entire day on mobile messaging. Furthermore, the high cost and diversity of channels available for reaching the online audience continues to add complexity.

We present our system for systematically acquiring new clients, which focused on integrating data across the "full funnel" User Experience. Using precision-based advertising, and 24/7 mobile messaging APIs, we show how we bridge together the online and offline interactions, to create a near-frictionless and ethically sound connection between law firms and potential new clients. We show how these techniques may be generalized to small and mid-sized businesses looking take advantage of data to fuel their growth.

Speakers

Michael Terry

CTO, Lawfty

I've spent the last 8 years as a data engineer and CTO, bringing data pipelines to numerous verticals. At Raytheon Space & Airborne Systems, I developed a radar data and signal processing pipeline for the military. At my first startup SeeTheScene.TV, we built a streaming video system... Read More →

Wednesday May 18, 2016 10:10am - 10:30am PDT
Ada

Law

11:10am PDT

Creating Value by Turning Law Into Data

The law has traditionally considered itself as “art not science”, but what value can be created by rethinking that art/science divide, and what would law as part science look like? Industries like sports, politics, and journalism provide recent and powerful comparisons, “moneyball” and Obama’s data-driven 2012 election campaign being the most famous examples of transformation. In the law, questions that can be answered with better data abound: how has a judge ruled on a defined type of case before? What is the likely outcome of filing a specific motion? How successful has a lawyer or firm been in representing certain kinds of cases, or before certain judges? This presentation will focus on the technology that Ravel Law uses to extract and classify such information, and on how it has built a business that harnesses the results.

Speakers

Daniel Lewis

General Manager, LexisNexis | CEO Ravel Law, LexisNexis

Wednesday May 18, 2016 11:10am - 11:30am PDT
Ada

Law

11:40am PDT

Hidden in plain sight: Using law to summarize the law

Summarization remains one of the most conspicuously unsolved problems in NLP today. There is plenty of active research, and the best available is reasonably capable of capturing the information contained in a body of text, but the output is often clumsy compared to what might be written by a human. More importantly for those of us working in legal informatics, the law is a notoriously conservative profession. Lawyers will justifiably be more comfortable relying on summaries written by fellow members of the bar. Fortunately, judges provide high quality, detailed summarization of the cases they cite to in their opinions, and a rich body of this data exists throughout the body of historical case law. We present a technique for enriching the display of judicial opinions with high quality summary data extracted from subsequent opinions, using a variety of state of the art open-source software tools. FOSS tools used include Antlr, for recognition of deterministic sequences using formal grammars, and Apache UIMA for construction of multi-layered indexes of recognized entities, such that their various juxtapositions can be used for further inference.

Speakers

Richard Downe

VP of Data Science, Casetext

I enjoy working on interesting problems, and have tried to work on a wide variety thereof. These have included FPGA design at IBM's Watson labs in Yorktown Heights, NY, research into the progression of coronary artery disease (focusing on image segmentation and ML prediction of disease... Read More →

presentation pdf

Wednesday May 18, 2016 11:40am - 12:20pm PDT
Ada

Law

1:10pm PDT

Case Law and ML on Spark

Apache Spark powers Ravel’s case law dissecting backend. This talk will cover the motivation for migrating to Spark, the benefits, pain points and experiences running on a legal corpus. Over a year ago, Ravel Law’s batch processing moved from Apache Pig to Apache Spark. Spark eliminated many Pig and Hadoop related pain points and has enabled rapid development of our case law processing pipeline to include running many NER, clustering and machine learning models, building search indexes and constructing the legal citation graph with case law, statutes and judges as nodes. Spark helps accelerate Ravel Law’s processing, development and integration of machine learning systems. Learn how we use Spark to prep our data and some of the building blocks we use to create our legal research and analytics products.

Speakers

Jeremy Corbett

Senior Lead Backend Engineer, Ravel Law

Wednesday May 18, 2016 1:10pm - 1:30pm PDT
Ada

Law

1:40pm PDT

Litigation Mitigation: Using NLP to Minimize Errors in Contracts

While work in natural language processing in the legal domain has primarily focused on case law and e-discovery, the domain of contracts and agreements has received comparatively little attention. Our work focuses on the potential for errors and oversights in such documents, a common problem with consequences ranging from professional embarrassment to litigation in the worst case. The American Bar Association estimates that administrative and substantive errors account for some 75% of all legal malpractice claims; our own small-scale analysis suggests that some one in five civil cases involve disputes over ambiguous contract language. We present our work applying computational linguistics techniques to automatically detecting such errors, focusing particularly on inconsistencies, ambiguities and style issues. Our evolving pipeline for contract analysis builds on augmenting corpora of contracts with manual annotations including judgments of ambiguity.

Speakers

Shipra Dingare

Lead Engineer, LitIQ

Gurinder Sangha

Founder and CEO, Lit IQ

I am the founder of Lit IQ, which is using advances in computational linguistics to help lawyers minimize litigation risk. I also teach at the University of Pennsylvania Law School and serve as a Fellow at the Stanford Center for Legal Informatics. Prior to Lit IQ, I founded Intelligize... Read More →

Wednesday May 18, 2016 1:40pm - 2:00pm PDT
Gardner

Law

1:40pm PDT

Privacy Issues in Big Data Processing in light of Data Breaches

Big Data processing is dominating several business applications. Businesses are realizing the benefits of Big Data and analyzing large volumes of data to gain insights about their customers. This causes more data to be collected and stored centrally. Much of the storage occurs in the Cloud. This attracts hackers since they could gain access to large volumes of data about people. Accessing this type of data results in disastrous consequences to individuals whose privacy is violated. In this talk we will look at five of the major data breaches within the recent past and look at ways to protect people's privacy by considering several Best Practices approach. Many in the IT field have realized that it would be very difficult to secure all data that is accessible from the cloud and so better mechanisms should be developed to protect such public data.

Speakers

S. Srinivasan

Associate Dean & Distinguished Professor, Texas Southern University

Professor and researcher in Information Security, Cloud Computing and Big Data Applications

Data by the Bay pdf

Wednesday May 18, 2016 1:40pm - 2:00pm PDT
Markov

Law