Saturday, February 18, 2023

What is DPPL metric

 The difference in positive proportions in predicted labels (DPPL) metric determines whether the model predicts outcomes differently for each facet. It is defined as the difference between the proportion of positive predictions (y’ = 1) for facet a and the proportion of positive predictions (y’ = 1) for facet d. For example, if the model predictions grant loans to 60% of a middle-aged group (facet a) and 50% other age groups (facet d), it might be biased against facet d. In this example, you need to determine whether the 10% difference is material to a case for bias. A comparison of DPL with DPPL assesses whether bias initially present in the dataset increases or decreases in the model predictions after training.


The formula for the difference in proportions of predicted labels:


        DPPL = q'a - q'd


Where:


q'a = n'a(1)/na is the predicted proportion of facet a who get a positive outcome of value 1. In our example, the proportion of a middle-aged facet predicted to get granted a loan. Here n'a(1) represents the number of members of facet a who get a positive predicted outcome of value 1 and na the is number of members of facet a.


q'd = n'd(1)/nd is the predicted proportion of facet d who get a positive outcome of value 1. In our example, a facet of older and younger people predicted to get granted a loan. Here n'd(1) represents the number of members of facet d who get a positive predicted outcome and nd the is number of members of facet d.


If DPPL is close enough to 0, it means that post-training demographic parity has been achieved.


references:

https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-dppl.html

No comments:

Post a Comment