Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
View analytic
Tuesday, May 17 • 11:40am - 12:20pm
Using Spark MLlib for NLP

Sign up or log in to save this to your schedule and see who's attending!

Apache Spark is most often used as a means of processing large amounts of data efficiently, but is also useful for the processing of individual predictions common to many NLP applications. The algorithms inside MLlib are useful in and of themselves, independent of the core Spark framework. IdiML is an open source tool that enables incredibly fast predictions on textual data by using various components within MLlib. It acts as a standalone tool for performing core machine learning functionality that can easily be integrated into production systems to provide low-latency continuous streaming predictions. This talk explores the functionality inside IdiML, how it uses MLlib, and why that makes such a big difference.


Speakers
avatar for Michelle Casbon

Michelle Casbon

Senior Data Science Engineer, Qordoba
Michelle Casbon is Director of Data Science at Qordoba. Previously, she was a Senior Data Science Engineer at Idibon, where she contributed to the goal of bringing language technologies to all the world’s languages. Michelle's development experience spans more than a decade across various industries, including media, investment banking, healthcare, retail, and geospatial services. Michelle completed a Masters at the University of... Read More →



Tuesday May 17, 2016 11:40am - 12:20pm
Markov

Attendees (23)