Publications

Exploring Gender Biases in Information Retrieval Relevance Judgement Datasets

Amin Bigdeli and Negar Arabzadeh and Morteza Zihayat and Ebrahim Bagheri
Reference:
Amin Bigdeli; Negar Arabzadeh; Morteza Zihayat and Ebrahim Bagheri Exploring Gender Biases in Information Retrieval Relevance Judgement Datasets. In 43rd European Conference on IR Research (ECIR 2021), 2021.
Links to Publication:
Abstract:
Recent studies in information retrieval have shown that gender biases have found their way into representational and algorithmic aspects of computational models. In this paper, we focus specifically on gender biases in information retrieval gold standard datasets, often referred to as relevance judgements. While not explored in the past, we submit that it is important to understand and measure the extent to which gender biases may be present in information retrieval relevance judgements primarily because relevance judgements are not only the primary source for evaluating IR techniques but are also widely used for training end-to-end neural ranking methods. As such, the presence of bias in relevance judgements would immediately find its way into how retrieval methods operate in practice. Based on a fine-tuned BERT model, we show how queries can be labeled for gender at scale based on which we label MS MARCO queries. We then show how different psychological characteristics are exhibited within documents associated with gendered queries within the relevance judgement datasets. Our observations show that stereotypical biases are prevalent in relevance judgement documents.
Bibtex Entry:
@inproceedings{ecir2021b, author = {Amin Bigdeli and Negar Arabzadeh and Morteza Zihayat and Ebrahim Bagheri}, title = {Exploring Gender Biases in Information Retrieval Relevance Judgement Datasets}, booktitle = {43rd European Conference on IR Research (ECIR 2021)}, year = {2021}, abstract = {Recent studies in information retrieval have shown that gender biases have found their way into representational and algorithmic aspects of computational models. In this paper, we focus specifically on gender biases in information retrieval gold standard datasets, often referred to as relevance judgements. While not explored in the past, we submit that it is important to understand and measure the extent to which gender biases may be present in information retrieval relevance judgements primarily because relevance judgements are not only the primary source for evaluating IR techniques but are also widely used for training end-to-end neural ranking methods. As such, the presence of bias in relevance judgements would immediately find its way into how retrieval methods operate in practice. Based on a fine-tuned BERT model, we show how queries can be labeled for gender at scale based on which we label MS MARCO queries. We then show how different psychological characteristics are exhibited within documents associated with gendered queries within the relevance judgement datasets. Our observations show that stereotypical biases are prevalent in relevance judgement documents.} }




Powered by WordPress