Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas spanned by multiple horizontal data pipelines, platforms, and algorithms. We are unifying data science and data engineering, showing what really works to run businesses at scale.
Image retrieval in response to keyword-based queries is a well studied problem. Web services such as Google Image Search are used daily by users all around the world. The typical use case for these services is using a short piece of text made up of a few individual tokens as the search phrase. The services, therefore, are designed to work with such queries and generally do not work well when a longer search string is used (for example a sentence). This does not align well with the recent push towards a more visual web as evidenced by the popularity of applications such as Instagram, and the rise in the popularity of microblogging services which resulted in an abundance of short text snippets that users may want to be able to retrieve accompanying images for automatically. In this paper we introduce a novel approach, called ImageSuggest, which sits between the user and the traditional image retrieval systems and allows the users to enter longer search strings. Our approach extracts and ranks search terms from the input strings and feeds the resulting keywords to the image retrieval systems. We evaluate our approach on a dataset of short texts from the anonymous social network Whisper and show that we are able to outperform standard keyword extraction and query generation techniques on image retrieval tasks.