TheItalianJob - Ethical Data Analysis | Graduate School Admission (Student track option 2)

Views 618
Team Name TheItalianJob
Track Student Track 2
Use Case Ethical Data Analysis
Technology SAS Viya, Python
Region EMEA
Team lead Pasquale Maritato
Team members @AAlessandrelli 
Social media handles *all team members' social media links here*
Is your team interested in participating in an interview? N
Optional: Expand on your technology expertise  

 

 

Pitch Video

 

Jury Video

 

Team photo

theitalianjob_pic.jpg

Comments

Great work, @TheItalianJob + @PasqualeM!

 

Your Team Profile is complete and looks great.  Thank you for putting the correct tag – “Student Track 2” so it’ll be easier to find and judge, when it’s time. 

 

If you’re excited to learn more about the Hack before September 16th – including a sneak-peak of the use case – please see my post here: https://communities.sas.com/t5/SAS-Hacker-s-Hub/SAS-Hackathon-2024-Student-Track-Details/ba-p/941054

 

Good luck!

Wonderful job @TheItalianJob!!!  You did an excellent job of creating an easy-to-follow analysis using both SAS Visual Analytics and SAS Model Studio.  In particular, I like how you carefully walked through the descriptive statistics - complete with maps - in SAS Visual Analytics.  You then used the Fairness + Bias Assessment tools in SAS Model Studio to refine your models to make the offers of admissions more equitable.  Finally - I love the recommendations for improving data collection at iLink University in the future...

Yay!

One question: did you notice any issues with the Legacy Admissions variable?  I purposely made it biased, and highly predictive, and was wondering what you saw in your model.  Regardless, great work!

@LGroves

Hey Lincoln, thank you very much for your feedback.

I'm glad you asked, since being in a hurry and not being video editing wizards, we skipped showing a lot of our analysis.

We agree that Legacy Admissions was indeed highly predictive; we built a page in SAS VA basically for each feature that showed the different admission rates for Legacy Admission values.

PasqualeM_1-1730200067413.png

Once we found out there was bias against some applicants, we dropped the sensitive features (gender, cultural identity, and country region).

At that point, there was still some bias, mainly for cultural identity. We had two options (actually, they are not mutually exclusive): dropping other features or mitigating the bias with exponentiated gradient reduction through the mitigate bias action. We tried both and also combinations of both.

Our results showed that dropping the Legacy Admission feature would not have a high impact both before and after the mitigation.

Example:

  • Before mitigation, keeping Legacy Admission

PasqualeM_2-1730200629719.png

  • Before mitigation, dropping Legacy Admission

PasqualeM_3-1730200688872.png

 

As you can see, there was a 3% drop in prediction parity bias for Cultural Identity, but an increase for Country Region and Gender.

Probably the features we built with feature engineering on the Mission Statement really offset the effect of Legacy Admission in terms of predictive power.

After mitigating the bias, dropping Legacy Admission only slightly increased the misclassification rate but had basically no effect on bias metrics (actually the model mitigated for demographic parity had a lower prediction bias when keeping Legacy Admission).

The dataset was really small, so the results might vary with a different split (we used a 60-20-20 split stratified on Admissions and Cultural Identity). At the same time, since it was so small, we preferred to mitigate the bias rather than dropping too many features.

For our suggestion to predict the future students performance instead of predicting the admission based on historical data, Legacy Admission could be a good predictor (in econometrics there are studies that show how parents educations affects students performances),  anyway it'd be important to check its effect on bias metrics.

 

We still have access to our pipelines so if you have any other question feel free to ask.

 

Andrea & Pasquale

 

 

 

 

 

Maybe you can exclude  gender, cultural identity, and country region.  Maybe other variables may be less bias. Do you have other variables for your sample model? Maybe the econometrics analysis would be better. I worked in the banking industry and using gender, cultural identity, and country region would have to be excluded in the models to prevent bias. Are you running logistic regression? How? Are you using stepwise methods?

 

Jonas V. Bilenas

jonas.bilenas@gmail.com

BLOG: https://jonasbilenascom.com/

 

Great response, @andrea & Pasqual!  Recapping a months worth of work in 10 minutes is never easy... and I appreciate the added detail provided above.  Moreover, I appreciate how critically both of you have thought about the challenges at hand.

 

Again, great work!

Version history
Last update:
2 weeks ago
Updated by:

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Article Tags

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!