Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas spanned by multiple horizontal data pipelines, platforms, and algorithms. We are unifying data science and data engineering, showing what really works to run businesses at scale.
We've built a classification system that can map "Software Developer", "MTS", and "Code Monkey" as well as millions of other English language entities into a common semantic space with just a few thousand labels, which we use to understand people's job titles, skills, majors, and degrees. We're now working on internationalizing this system in a scalable way. The original method was labor intensive, so we have come up with an approach that leverages our English language work to provide good quality results in other languages with a small fraction of the effort.