Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas spanned by multiple horizontal data pipelines, platforms, and algorithms. We are unifying data science and data engineering, showing what really works to run businesses at scale.
Many search appliances exist today to make full-text search fairly simple: Elasticsearch, Solr, Algolia; the list goes on. However, all of these services implement n-gram or token level analysis of the text, and can only really do search on exact or partial matches of text in the base corpus. It is not able to match on overlapping concepts. For example, if a document is about the programming language Java, and you search for "Computer Science", unless computer science is explicitly mentioned in the document, it won't be scored highly. Recent innovations in general purpose word vectors and the ability to compose them to create general purpose document vectors provides a way to create conceptual, semantic search products. This talk will demonstrate creating a semantic search engine on a few non-trivial corpora.