Publications

ReQue: A Configurable Workflow and Dataset Collection for Query Refinement

Mahtab Tamannaee and Hossein Fani and Fattane Zarrinkalam and Jamil Samouh and Samad Paydar and Ebrahim Bagheri
Reference:
Mahtab Tamannaee; Hossein Fani; Fattane Zarrinkalam; Jamil Samouh; Samad Paydar and Ebrahim Bagheri ReQue: A Configurable Workflow and Dataset Collection for Query Refinement. In The 29th ACM International Conference on Information and Knowledge Management, (CIKM2020), 2020.
Links to Publication:
Abstract:
In this paper, we implement and publicly share a configurable software workflow and a collection of gold standard datasets for training and evaluating supervised query refinement methods. Existing datasets such as AOL and MS MARCO, which have been extensively used in the literature for this purpose, are based on the weak assumption that users’ input queries improve gradually within a search session, i.e., the last query where the user ends her information seeking session is the best reconstructed version of her initial query. In practice, such an assumption is not necessarily accurate for a variety of reasons, e.g., topic drift. The objective of our work is to enable researchers to build gold standard query refinement datasets without having to rely on such weak assumptions. Our software workflow, which generates such gold standard query datasets, takes three inputs: (1) a dataset of queries along with their associated relevance judgements (e.g. TREC topics), (2) an information retrieval method (e.g., BM25), and (3) an evaluation metric (e.g., MAP), and outputs a gold standard dataset. The produced gold standard dataset includes a list of revised queries for each query in the input dataset, each of which effectively improves the performance of the specified retrieval method (e.g., BM25) in terms of the desirable evaluation metric (e.g., MAP). Since our workflow can be used to generate gold standard datasets for any input query set, in this paper, we have generated and publicly shared gold standard datasets for TREC queries associated with Robust04, Gov2, ClueWeb09, and ClueWeb12. The source code of our software workflow, the generated gold datasets, and benchmark results for three state-of-the-art supervised query refinement methods over these datasets are made publicly available for reproducibility purposes.
Bibtex Entry:
@inproceedings{cikm-reque, author = {Mahtab Tamannaee and Hossein Fani and Fattane Zarrinkalam and Jamil Samouh and Samad Paydar and Ebrahim Bagheri}, title = {ReQue: A Configurable Workflow and Dataset Collection for Query Refinement}, abstract = {In this paper, we implement and publicly share a configurable software workflow and a collection of gold standard datasets for training and evaluating supervised query refinement methods. Existing datasets such as AOL and MS MARCO, which have been extensively used in the literature for this purpose, are based on the weak assumption that users’ input queries improve gradually within a search session, i.e., the last query where the user ends her information seeking session is the best reconstructed version of her initial query. In practice, such an assumption is not necessarily accurate for a variety of reasons, e.g., topic drift. The objective of our work is to enable researchers to build gold standard query refinement datasets without having to rely on such weak assumptions. Our software workflow, which generates such gold standard query datasets, takes three inputs: (1) a dataset of queries along with their associated relevance judgements (e.g. TREC topics), (2) an information retrieval method (e.g., BM25), and (3) an evaluation metric (e.g., MAP), and outputs a gold standard dataset. The produced gold standard dataset includes a list of revised queries for each query in the input dataset, each of which effectively improves the performance of the specified retrieval method (e.g., BM25) in terms of the desirable evaluation metric (e.g., MAP). Since our workflow can be used to generate gold standard datasets for any input query set, in this paper, we have generated and publicly shared gold standard datasets for TREC queries associated with Robust04, Gov2, ClueWeb09, and ClueWeb12. The source code of our software workflow, the generated gold datasets, and benchmark results for three state-of-the-art supervised query refinement methods over these datasets are made publicly available for reproducibility purposes.}, booktitle = {The 29th ACM International Conference on Information and Knowledge Management, (CIKM2020)}, year = {2020} }




Powered by WordPress