Data By the Bay has ended
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
Back To Schedule
Tuesday, May 17 • 10:40am - 11:00am
Hunting Criminals with Hybrid Analytics

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Fraud detection is a classic adversarial analytics challenge: As soon as an automated system successfully learns to stop one scheme, fraudsters move on to attack another way. Each scheme requires looking for different signals (i.e. features) to catch; is relatively rare (one in millions for finance or e-commerce); and may take months to investigate a single case (in healthcare or tax, for example) – making quality training data scarce. This talk will cover, via live demo and code walk-through, the key lessons we've learned while building such real-world software systems over the past few years. We'll be looking for fraud signals in public email datasets, using IPython and popular open-source libraries (scikit-learn, statsmodel, nltk, etc.) for data science and Apache Spark as the compute engine for scalable parallel processing. The model is an ensemble using a combination of natural language, graph analysis and time series analysis features, and is re-trained using an automated pipeline to learn from feedback on the fly.

avatar for David Talby

David Talby

Chief Technology Officer, John Snow Labs
David Talby is a chief technology officer at John Snow Labs, the creators of Spark NLP: a production-grade, fast & trainable implementation of the latest research in natural language processing. David specializes in building & operating AI systems in healthcare and life science, and... Read More →

Tuesday May 17, 2016 10:40am - 11:00am PDT