Name: byte2vec: a flexible embedding model constructed from bytes
Start: 2016-05-17T16:20:00-0700
End: 2016-05-17T16:40:00-0700

Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas spanned by multiple horizontal data pipelines, platforms, and algorithms. We are unifying data science and data engineering, showing what really works to run businesses at scale.

Back To Schedule

byte2vec: a flexible embedding model constructed from bytes

In today's fragmented, globalized world, supporting multiple languages in NLU and NLP applications is more important than ever. The inherent language dependence in classical Machine Learning and rule-based NLP systems has traditionally been a barrier to scaling said systems to new languages. This dependence typically manifests itself in feature extraction, as well as in pre-processing steps. In this talk, we present byte2vec as an extension to the well-known word2vec embedding model to facilitate dealing with multiple languages and unknown words. We explore its efficacy in a multilingual setting for tasks such as Twitter Sentiment Analysis and ABSA. Byte2vec is an embedding model that is constructed directly from the rawest forms of input: bytes, and is: i. truly language-independent; ii. particularly apt for synthetic languages through the use of morphological information; iii. intrinsically able to deal with unknown words; and iv. directly pluggable into state-of-the-art NN architectures. Pre-trained embeddings generated with byte2vec can be fed into state-of-the-art models; byte2vec can also be directly integrated and fine-tuned as a general-purpose feature extractor, similar to VGGNet's current role for computer vision.

Speakers

Parsa Ghaffari

Founder, Aylien

Parsa Ghaffari is an engineer and entrepreneur working in the field of Artificial Intelligence. He currently runs AYLIEN, a leading AI startup focused on creating technologies for analyzing and understanding unstructured content (text and images). Parsa will explain how Aylien is... Read More →

Tuesday May 17, 2016 4:20pm - 4:40pm PDT
Markov

Text

Data By the Bay

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Parsa Ghaffari

Attendees (16)