BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
omerzeybek
Obsidian | Level 7

Hi,

I am trying to build an propensity model in order to describe general purpose loan demand and then predict indivuals which are most likely to buy general purpose loan…

But while preparing my data set i have had some challenges.

First i have coded every loan application in 2012 as 1 and remaining as 0… I am just aiming to asses factors causes some one more likely yo buy a credit package…

However  I want to describe account balances at the time of application and after application so i have used account balance three months prior to loan application as a explanatory variable, for the cases which had applied to credit package it is clear to find values, but for the ones that have not applied for a loan in 2012 i am unable to find any specific account balance data

Because that one’s has no application date

Do you have any idea how can i keep this kind of variable in my work…

Thank you very much

1 ACCEPTED SOLUTION

Accepted Solutions
JasonXin
SAS Employee

In credit risk separation model such as your case, there are typically two kinds. One is credit acquisition. The other is behavior risk related to accounts existing on your book. Your description seems to be the former; if later you would have missing balance history on the non-applicant.

By all means, you should avoid using the data that pertain to one side of the separation only. Technically, that is separation or quansi-separation by birth: when you build your model, it will be dominated by one or two such variables.

If it is just one such variable you are 'crazy' about (I suspect you have a quite few. Possible your boss just dictates this one to you), you can engage other variables that have available observations on both 0 and 1 groups. Do a clustering or KNN on the 1 and 0 combined, hoping to see a good mix of 1 and 0 in the resulting clusters or 'neighborhood'. Depending on how the non-missing account balance variable distributes inside the clusters, you can decide voting mechanism to impute the missing value for the 0 group. If you are comfortable building up large number of clusters, you can have fairly differential impute values for the missing. But don't drive too far. One rather primitive exercise to this is subgroup regression: pick some other variables that are common to both groups to predict the balance, using the 1 group only. Then use the model to score on the non-missing group. This method has a lot of complications down the road for your model.  This practice, in essence, is the same as 'reject inference' where the focus is to infer 1 and 0 assignment on the rejected applicant group the charge-off (bad) status is unknown due to the rejection. Overall, this practice should not be applied to many variables as model drivers in the same model universe.

Jason Xin

View solution in original post

1 REPLY 1
JasonXin
SAS Employee

In credit risk separation model such as your case, there are typically two kinds. One is credit acquisition. The other is behavior risk related to accounts existing on your book. Your description seems to be the former; if later you would have missing balance history on the non-applicant.

By all means, you should avoid using the data that pertain to one side of the separation only. Technically, that is separation or quansi-separation by birth: when you build your model, it will be dominated by one or two such variables.

If it is just one such variable you are 'crazy' about (I suspect you have a quite few. Possible your boss just dictates this one to you), you can engage other variables that have available observations on both 0 and 1 groups. Do a clustering or KNN on the 1 and 0 combined, hoping to see a good mix of 1 and 0 in the resulting clusters or 'neighborhood'. Depending on how the non-missing account balance variable distributes inside the clusters, you can decide voting mechanism to impute the missing value for the 0 group. If you are comfortable building up large number of clusters, you can have fairly differential impute values for the missing. But don't drive too far. One rather primitive exercise to this is subgroup regression: pick some other variables that are common to both groups to predict the balance, using the 1 group only. Then use the model to score on the non-missing group. This method has a lot of complications down the road for your model.  This practice, in essence, is the same as 'reject inference' where the focus is to infer 1 and 0 assignment on the rejected applicant group the charge-off (bad) status is unknown due to the rejection. Overall, this practice should not be applied to many variables as model drivers in the same model universe.

Jason Xin

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1488 views
  • 1 like
  • 2 in conversation