Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas spanned by multiple horizontal data pipelines, platforms, and algorithms. We are unifying data science and data engineering, showing what really works to run businesses at scale.
Apache Spark powers Ravel’s case law dissecting backend. This talk will cover the motivation for migrating to Spark, the benefits, pain points and experiences running on a legal corpus. Over a year ago, Ravel Law’s batch processing moved from Apache Pig to Apache Spark. Spark eliminated many Pig and Hadoop related pain points and has enabled rapid development of our case law processing pipeline to include running many NER, clustering and machine learning models, building search indexes and constructing the legal citation graph with case law, statutes and judges as nodes. Spark helps accelerate Ravel Law’s processing, development and integration of machine learning systems. Learn how we use Spark to prep our data and some of the building blocks we use to create our legal research and analytics products.