REFLECT: TEACHING MACHINE LEARNING MODELS TO RECONSIDER THEIR BIASES

Abstract Machine learning models have gained prevalence in the world we live in today. Currently, they are employed in different fields to automate the decision-making process of various institutions. With the increasing availability of data to train these models, machine learning models are adopted to solve complex classification problems that have the potential to affect people’s lives both positively and negatively. Unfortunately, certain patterns of bias that are tied to the presence of socio-economic characteristics of individuals such as race, gender and income levels may exist in the data used in training these models. When such data is employed in creating predictive machine learning models, they go on to make decisions that go against certain classes of individuals in society and favour others. This paper elaborates on a method that improves the fairness of machine learning models by closing the disparity between the misclassification rates of predictions made for classes within a sensitive group under consideration. It achieves this by modifying the loss function of a classifier such that it considers the disparate mistreatment of people, based on their membership of a particular sensitive class. This is done by calculating for the gradient of the error between predictions made and the ground truth for each sensitive group (based on the sensitive feature taken under consideration that may contribute to unfair decision making). This gradient is then added to the gradient of the Cross-Entropy loss function of a Logistic Regression classifier. By including this modification to models’ loss functions, it learns parameters that not only correctly predict recidivism, but also minimize disparate mistreatment. This experiment was able to close the disparity between the false positive rates of recidivism risk score predictions made on African Americans and individuals of other racial origin in a subset of the COMPAS Recidivism Dataset by 52% (from a difference of 0.25 to 0.12). It was able to do so with a test accuracy of 70.4%.