Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
Tuesday, May 17 • 2:10pm - 2:50pm
SAMEntics : Tools for paraphrase detection and paraphrase generation

Sparse Ground Truth, mediocre quality of training data, limited representation of novel queries, heavy biases due to human intervention and large time overheads associated with manual cluster creation are inconveniences that both partners and the Watson Ecosystem technical team face on a day-to-day basis. Enriching Ground Truth, boosting the quality of training data, factoring in for novel queries and minimizing biases & time sucks due to human intervention therefore emerge as preprocessing requirements that are crucial to meeting the needs of a more seamless transition into when utilizing a cognitive service that is powered by Watson. SAMEntics(Same + Semantics) has been conceptualized to match this exact purpose and provides an efficient alternative to handling large volumes of text across domains to scale. It comprises tools for paraphrase detection and paraphrase generation and is directed at 1. discovering rewording in sentences across domains 2. bucketing hierarchical categories within domains by capturing intent 3. expediting question(s)-answer(s) mapping 4. rendering syntactically correct phrasal variations of sentences while retaining semantic meaning to enrich partner ground truth, boost training data quality and minimize biases and time sucks due to human intervention. SAMEntics thus provides an intelligent alternative to handling large volumes of text efficiently by not only automatically rendering clusters based off user intent in a hierarchical manner but also by generating rewordings of user queries in the case of sparse and(or) poor quality training data. Join us as we go over the current and emerging state-of-the-art in this space. Reflect on what is changing the world in this era of cognition. Dive deep into the pipeline and the core algorithmic paradigms that power a paraphrase detection and paraphrase generation engine. And leave with an understanding of what it takes to build a product that provides data science-as-a-service.

Niyati Parameswaran

Data Scientist, IBM Watson
Niyati works as a data scientist for the Watson Ecosystem team. A dream of being able to provide a machine with intelligence that is unique, that can augment our own distinctive intelligence and that ultimately hopes to answer Alan Turing's question of 'Can machines think?' motivates her research. She holds a Bachelors in Computer Science from The Birla Institute of Science & Technology in India, and a Masters in Computer Science with a... Read More →

