TheItalianJob - Ethical Data Analysis | Graduate School Admission (Student track option 2)

8 Likes

Team Name	TheItalianJob
Track	Student Track 2
Use Case	Ethical Data Analysis
Technology	SAS Viya, Python
Region	EMEA
Team lead	Pasquale Maritato
Team members	@AAlessandrelli
Social media handles	all team members' social media links here
Is your team interested in participating in an interview?	N
Optional: Expand on your technology expertise

Pitch Video

TheItalianJob-Pitch.mp4

Video Player is loading.

Current Time 0:00

Duration 0:00

Loaded: 0%

Stream Type LIVE

Remaining Time 0:00

(view in My Videos)

Jury Video

TheItalianJob-Jury.mp4

Video Player is loading.

Current Time 0:00

Duration 0:00

Loaded: 0%

Stream Type LIVE

Remaining Time 0:00

(view in My Videos)

Team photo

LGroves · ‎09-05-2024

Great work, @TheItalianJob + @PasqualeM!

Your Team Profile is complete and looks great. Thank you for putting the correct tag – “Student Track 2” so it’ll be easier to find and judge, when it’s time.

If you’re excited to learn more about the Hack before September 16^th– including a sneak-peak of the use case – please see my post here: https://communities.sas.com/t5/SAS-Hacker-s-Hub/SAS-Hackathon-2024-Student-Track-Details/ba-p/941054

Good luck!

LGroves · ‎10-28-2024

Wonderful job @TheItalianJob!!! You did an excellent job of creating an easy-to-follow analysis using both SAS Visual Analytics and SAS Model Studio. In particular, I like how you carefully walked through the descriptive statistics - complete with maps - in SAS Visual Analytics. You then used the Fairness + Bias Assessment tools in SAS Model Studio to refine your models to make the offers of admissions more equitable. Finally - I love the recommendations for improving data collection at iLink University in the future...

Yay!

One question: did you notice any issues with the Legacy Admissions variable? I purposely made it biased, and highly predictive, and was wondering what you saw in your model. Regardless, great work!

PasqualeM · ‎10-29-2024

@LGroves

Hey Lincoln, thank you very much for your feedback.

I'm glad you asked, since being in a hurry and not being video editing wizards, we skipped showing a lot of our analysis.

We agree that Legacy Admissions was indeed highly predictive; we built a page in SAS VA basically for each feature that showed the different admission rates for Legacy Admission values.

Once we found out there was bias against some applicants, we dropped the sensitive features (gender, cultural identity, and country region).

At that point, there was still some bias, mainly for cultural identity. We had two options (actually, they are not mutually exclusive): dropping other features or mitigating the bias with exponentiated gradient reduction through the mitigate bias action. We tried both and also combinations of both.

Our results showed that dropping the Legacy Admission feature would not have a high impact both before and after the mitigation.

Example:

Before mitigation, keeping Legacy Admission

Before mitigation, dropping Legacy Admission

As you can see, there was a 3% drop in prediction parity bias for Cultural Identity, but an increase for Country Region and Gender.

Probably the features we built with feature engineering on the Mission Statement really offset the effect of Legacy Admission in terms of predictive power.

After mitigating the bias, dropping Legacy Admission only slightly increased the misclassification rate but had basically no effect on bias metrics (actually the model mitigated for demographic parity had a lower prediction bias when keeping Legacy Admission).

The dataset was really small, so the results might vary with a different split (we used a 60-20-20 split stratified on Admissions and Cultural Identity). At the same time, since it was so small, we preferred to mitigate the bias rather than dropping too many features.

For our suggestion to predict the future students performance instead of predicting the admission based on historical data, Legacy Admission could be a good predictor (in econometrics there are studies that show how parents educations affects students performances), anyway it'd be important to check its effect on bias metrics.

We still have access to our pipelines so if you have any other question feel free to ask.

Andrea & Pasquale

jbilenas · ‎10-29-2024

Maybe you can exclude gender, cultural identity, and country region. Maybe other variables may be less bias. Do you have other variables for your sample model? Maybe the econometrics analysis would be better. I worked in the banking industry and using gender, cultural identity, and country region would have to be excluded in the models to prevent bias. Are you running logistic regression? How? Are you using stepwise methods?

Jonas V. Bilenas

jonas.bilenas@gmail.com

BLOG: https://jonasbilenascom.com/

LGroves · ‎10-31-2024

Great response, @andrea & Pasqual! Recapping a months worth of work in 10 minutes is never easy... and I appreciate the added detail provided above. Moreover, I appreciate how critically both of you have thought about the challenges at hand.

Again, great work!

SAS Hacker's Hub