Data By the Bay has ended
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
Back To Schedule
Tuesday, May 17 • 4:00pm - 4:40pm
Mining Noisy Transaction Data with Neural Nets

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Extracting relevant information from unstructured transaction data presents a challenge for those who may want to use such data for making business decisions such as underwriting loans or for monitoring credit worthiness. Most of our transaction data is in the form of transaction text describing the transaction often using abbreviations or unknown proper nouns. A common approach for text documents is to encode the words or documents into vectors using a neural net layer or multiple layers. These features may then be used in a classification algorithm or other models for predicting an outcome. To this end, we encoded transaction data of small 'sentences', often of only a few words, using skip-gram word2vec models along with RBM and Deep Belief Nets utilizing other features such as credit or debit value of transaction and institution information. The goal of this discussion is to describe the performance of the model and also considerations for training a nn in a large-data distributed framework like Spark. Tools used are Deeplearning4j, Spark, Scala.

avatar for Frank Taylor

Frank Taylor

Data Scientist, Earnest, Inc.
I have a background in Physics specializing in statistical modeling of particle decays and later in optical signal processing. I am passionate about Big Data and its potential to gather insight into so many facets of humanity. As our tools get better and more scalable, we have the... Read More →

Tuesday May 17, 2016 4:00pm - 4:40pm PDT