Data By the Bay has ended
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
Back To Schedule
Wednesday, May 18 • 1:10pm - 1:30pm
Case Law and ML on Spark

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Apache Spark powers Ravel’s case law dissecting backend. This talk will cover the motivation for migrating to Spark, the benefits, pain points and experiences running on a legal corpus. Over a year ago, Ravel Law’s batch processing moved from Apache Pig to Apache Spark. Spark eliminated many Pig and Hadoop related pain points and has enabled rapid development of our case law processing pipeline to include running many NER, clustering and machine learning models, building search indexes and constructing the legal citation graph with case law, statutes and judges as nodes. Spark helps accelerate Ravel Law’s processing, development and integration of machine learning systems. Learn how we use Spark to prep our data and some of the building blocks we use to create our legal research and analytics products.

avatar for Jeremy Corbett

Jeremy Corbett

Senior Lead Backend Engineer, Ravel Law

Wednesday May 18, 2016 1:10pm - 1:30pm PDT