Data By the Bay has ended
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
Back To Schedule
Tuesday, May 17 • 1:10pm - 1:50pm
Scalably Internationalizing Millions of Latent Semantic Labels

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

We've built a classification system that can map "Software Developer", "MTS", and "Code Monkey" as well as millions of other English language entities into a common semantic space with just a few thousand labels, which we use to understand people's job titles, skills, majors, and degrees. We're now working on internationalizing this system in a scalable way. The original method was labor intensive, so we have come up with an approach that leverages our English language work to provide good quality results in other languages with a small fraction of the effort.

avatar for Xiao Fan

Xiao Fan

Dev Manager, Workday
My team is currently working on automated internationalization for a tool that provides semantic labels for plain English job titles.

Tuesday May 17, 2016 1:10pm - 1:50pm PDT