Data By the Bay has ended
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
Wednesday, May 18 • 11:40am - 12:20pm
Hidden in plain sight: Using law to summarize the law

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Summarization remains one of the most conspicuously unsolved problems in NLP today. There is plenty of active research, and the best available is reasonably capable of capturing the information contained in a body of text, but the output is often clumsy compared to what might be written by a human. More importantly for those of us working in legal informatics, the law is a notoriously conservative profession. Lawyers will justifiably be more comfortable relying on summaries written by fellow members of the bar. Fortunately, judges provide high quality, detailed summarization of the cases they cite to in their opinions, and a rich body of this data exists throughout the body of historical case law. We present a technique for enriching the display of judicial opinions with high quality summary data extracted from subsequent opinions, using a variety of state of the art open-source software tools. FOSS tools used include Antlr, for recognition of deterministic sequences using formal grammars, and Apache UIMA for construction of multi-layered indexes of recognized entities, such that their various juxtapositions can be used for further inference.

avatar for Richard Downe

Richard Downe

VP of Data Science, Casetext
I enjoy working on interesting problems, and have tried to work on a wide variety thereof. These have included FPGA design at IBM's Watson labs in Yorktown Heights, NY, research into the progression of coronary artery disease (focusing on image segmentation and ML prediction of disease... Read More →

Wednesday May 18, 2016 11:40am - 12:20pm PDT

Attendees (6)