Background
Issues with biased data collection and bias magnification by algorithms has plagued AI since its inception. Biased collection has been well understood since the study of statistics began. For example, data collected only on one segment of the people in one country may not extrapolate well to the full population of 7.9 billion people on earth.
Historic biases can also be perpetuated through automated algorithms. Here’s one example.
The Diner’s Club cardboard credit card started in the United States in 1949. It was only available to a select group of 200 white men, essentially friends of the founders. Two years later the membership had grown to 42,000. Plastic credit cards issued by banks came out in 1958. Yet it wasn’t until mid 1970s that women and minorities were able to get credit cards on their own. But of course, at that point, it was still much more difficult for them to get credit because as a group they lacked any credit history. Formulas that considered credit history were biased against those who had not even been permitted to have a credit history.
SAS Viya has been able to assess bias for a while. And even better, SAS Viya can now help you mitigate bias inside your modeling process! SAS Viya lets you:
Understanding the Fairness Metrics Used to Assess and Mitigate Bias in SAS Viya
The plots on the Fairness and Bias tab in the Results section highlight potential differences in model performance for different groups within specified sensitive variables.
The assessBias action provides a number of useful measures of bias. These include performance bias, performance bias parity, prediction bias, and prediction bias parity.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
The mitigateBias action addresses bias using the exponentiated gradient reduction (EGR) algorithm. EGR runs multiple iterations. In each iteration it can reweight and relabel the data, and then trains a new classification model. So it actually incorporates fairness constraints within the modeling process!
The SAS Viya mitigateBias action with EGR supports a number of fairness measures: demographic parity, equalized opportunity and equalized odds. It will take these measures into account in finding your best model, and will provide you detailed results of not only the model performance but also the bias metrics.
Let’s explain some of the fairness statistics used to assess and mitigate bias by using a simple example. Say we are using an algorithm to determine hiring rates. We want to check for any bias related to whether someone is Latinx or not. Let’s consider a binary sensitive variable LatinxStatus, which can be either LATINX or NONLATINX.
Performance bias parity compares the fit statistics of the model for the LATINX group versus for the NONLATINX group. If the model is not a good fit for both groups, that is a problem. Try some other models to find one that works well for both groups. It’s easy to compare models with SAS Viya, the fastest, most productive AI and Analytics.
Prediction bias parity compares the probability of hire between the LATINX and NONLATINX groups. The measure is the difference between those two probabilities. This should be low; the best model would not show a different average predications among goups if the sensitive variable (or its surrogates…but that’s a topic for another day) does not play a role in predicting hiring.
Demographic parity compares the selection rate of each category (LATINX and NONLATINX) of the sensitive variable LatinxStatus. The measure is the difference between those selection rates. Demographic parity can help an organization balance out historical biases that impact the data.
Equalized opportunity compares the true positive rate for the LATINX group versus the NONLATINX group. Be sure you are getting a high true positive rate for each group.
Equalized odds looks at the maximum difference in true positive rate OR false positive rate between the NONLATINX and LATINX groups. You want to be sure there that both the TPR and FPR are similar between the two groups.
Watch how this works in action in SAS Viya:
The screen capture below shows how sensitive variables can be designated in the Data pane of SAS Model Studio.
The screen capture below shows example results of the assessBias action with graphs of performance bias and performance bias parity, prediction bias and prediction bias parity, and bias metrics and bias parity metrics.
CODE examples:
Here is the code I used with screen shots of the results interspersed:
/* BethHeartSexMITIGATE */
cas MySession sessopts=(caslib=casuser timeout=1800 locale="en_US");
libname casuser cas caslib="casuser";
proc casutil;
droptable casdata="casuser.heartone" quiet;
load data=sashelp.heart outcaslib="casuser"
casout="HEARTone" promote;
run;
quit;
proc contents data=casuser.heartone;
run;
proc print data = casuser.heartone (obs = 100);
run;
data casuser.hearttwo;
set casuser.heartone;
where DeathCause = "Cancer" or DeathCause = "Coronary Heart Disease";
run;
data casuser.heartthree;
set casuser.hearttwo;
if DeathCause = "Coronary Heart Disease" then DeathCause = "Heart";
run;
proc print data=casuser.heartthree (obs = 50);
run;
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Train a Gradient Boosting Model
proc cas;
decisionTree.gbtreeTrain /
inputs={"Systolic", "Diastolic", "Weight", "Height", "Sex"},
maxLevel="5",
saveState={name="gbtreeASTORE", replace="True"},
seed=1234,
table="HEARTthree",
target="DeathCause";
run;
Assess Bias
proc cas;
fairAITools.assessBias /
modelTable="gbtreeASTORE",
modelTableType="ASTORE",
event = "Heart",
predictedVariables={"P_DeathCauseCancer", "P_DeathCauseHeart"},
response="DeathCause",
responseLevels={"Cancer", "Heart"},
sensitiveVariable="Sex",
table="HEARTthree";
run;
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Mitigate Bias
proc cas;
fairAITools.mitigateBias /
biasMetric="DEMOGRAPHICPARITY",
event="Heart",
learningRate="0.01",
maxIters="10",
predictedVariables={"P_DeathCauseCancer", "P_DeathCauseHeart"},
response="DeathCause",
responseLevels={"Cancer", "Heart"},
sensitiveVariable="Sex",
table="HEARTthree",
tolerance="0.005",
trainProgram="
decisionTree.gbtreeTrain result=train_res /
table=table,
weight=weight,
target=""DeathCause"",
inputs= {
""Systolic"", ""Diastolic"", ""Weight"", ""Height"", ""Sex""
},
nominals={""DeathCause"",""Sex""},
nBins=50,
quantileBin=True,
maxLevel=5,
maxBranch=2,
leafSize=5,
missing=""USEINSEARCH"",
minUseInSearch=1,
binOrder=True,
varImp=True,
mergeBin=True,
encodeName=True,
nTree=15,
seed=12345,
ridge=1,
savestate={
name=""HEART_gb_astore"",
replace=True
}
;
astore.score result=score_res /
table=table,
casout=casout,
copyVars=copyVars,
rstore=""HEART_gb_astore""
;
",
tuneBound="True";
run;
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
A number of additional code samples are available in the documentation.
These code examples allow for various combinations of feature (input) and target (outcome) types, as shown in the slide image below, and repeated as text for anyone who cannot read the image:
Always Always Always Explore Your Data
In this blog I’ve shown you how to assess and mitigate bias as part of the modeling process using coding and SAS Model Studio. But remember how much you can learn about your data and how much time and headaches you can save yourself down the road if you first explore your data using SAS Visual Analytics. Here’s an example using the same HEART data I used in the mitigateBias video I pointed you to.
Release History
The assessBias action became available via programming (in CASL, Lua, Python or R) in October 2021. The following month (November 2021) the Fairness and Bias tab became available in SAS Model Studio. Most recently, in October 2022, the mitigateBias action became available via programming (in CASL, Lua, Python or R). This information is also shown in the table below.
FOR MORE INFORMATION Videos
Blogs
Find more articles from SAS Global Enablement and Learning here.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.