Publications

Neural Embedding-based Indices for Semantic Search

Fatemeh Lashkari and Ebrahim Bagheri and Ali A. Ghorbani
Reference:
Fatemeh Lashkari; Ebrahim Bagheri and Ali A. Ghorbani Neural Embedding-based Indices for Semantic Search. In Information Processing and Management: to appear, 2018.
Links to Publication: [www][pdf]
Abstract:
Traditional information retrieval techniques that primarily rely on keyword-based linking of the query and document spaces face challenges such as the emphvocabulary mismatch problem where relevant documents to a given query might not be retrieved simply due to the use of different terminology for describing the same concepts. As such, semantic search techniques aim to address such limitations of keyword-based retrieval models by incorporating semantic information from standard knowledge bases such as Freebase and DBpedia. The literature has already shown that while the sole consideration of semantic information might not lead to improved retrieval performance over keyword-based search, their consideration enables the retrieval of a set of relevant documents that cannot be retrieved by keyword-based methods. As such, building indices that store and provide access to semantic information during the retrieval process is important. While the process for building and querying keyword-based indices is quite well understood, the incorporation of semantic information within search indices is still an open challenge. Existing work have proposed to build one unified index encompassing both textual and semantic information or to build separate yet integrated indices for each information type but they face limitations such as increased query process time. In this paper, we propose to use neural embeddings-based representations of term, semantic entity, semantic type and documents within the same embedding space to facilitate the development of a unified search index that would consist of these four information types. We perform experiments on standard and widely used document collections including Clueweb09-B and Robust04 to evaluate our proposed indexing strategy from both empheffectiveness and emphefficiency perspectives. Based on our experiments, we find that when neural embeddings are used to build inverted indices; hence relaxing the requirement to explicitly observe the posting list key in the indexed document: (a) textitretrieval efficiency will increase compared to a standard inverted index, hence reduces the index size and query processing time, and (b) while retrieval efficiency, which is the main objective of an efficient indexing mechanism improves using our proposed method, textitretrieval effectiveness also retains competitive performance compared to the baseline in terms of retrieving a reasonable number of relevant documents from the indexed corpus.
Bibtex Entry:
@article{ipm2018-fatemehl, author = {Fatemeh Lashkari and Ebrahim Bagheri and Ali A. Ghorbani}, title = {Neural Embedding-based Indices for Semantic Search}, journal = {Information Processing and Management}, url = {https://www.journals.elsevier.com/information-processing-and-management}, year = {2018}, pages = {to appear}, webpdf = {http://ls3.rnet.ryerson.ca/wiki/images/0/03/Ipm_2018_221.pdf}, abstract = {Traditional information retrieval techniques that primarily rely on keyword-based linking of the query and document spaces face challenges such as the \emph{vocabulary mismatch problem} where relevant documents to a given query might not be retrieved simply due to the use of different terminology for describing the same concepts. As such, semantic search techniques aim to address such limitations of keyword-based retrieval models by incorporating semantic information from standard knowledge bases such as Freebase and DBpedia. The literature has already shown that while the sole consideration of semantic information might not lead to improved retrieval performance over keyword-based search, their consideration enables the retrieval of a set of relevant documents that cannot be retrieved by keyword-based methods. As such, building indices that store and provide access to semantic information during the retrieval process is important. While the process for building and querying keyword-based indices is quite well understood, the incorporation of semantic information within search indices is still an open challenge. Existing work have proposed to build one unified index encompassing both textual and semantic information or to build separate yet integrated indices for each information type but they face limitations such as increased query process time. In this paper, we propose to use neural embeddings-based representations of term, semantic entity, semantic type and documents within the same embedding space to facilitate the development of a unified search index that would consist of these four information types. We perform experiments on standard and widely used document collections including Clueweb09-B and Robust04 to evaluate our proposed indexing strategy from both \emph{effectiveness} and \emph{efficiency} perspectives. Based on our experiments, we find that when neural embeddings are used to build inverted indices; hence relaxing the requirement to explicitly observe the posting list key in the indexed document: (a) \textit{retrieval efficiency} will increase compared to a standard inverted index, hence reduces the index size and query processing time, and (b) while retrieval efficiency, which is the main objective of an efficient indexing mechanism improves using our proposed method, \textit{retrieval effectiveness} also retains competitive performance compared to the baseline in terms of retrieving a reasonable number of relevant documents from the indexed corpus.} }




Powered by WordPress