Hi
I have a data set of around 80K customers. I have classified these customers as GOOD, BAD or INDETERMINATE based on their payment history for the Last 12 months.
Each customer is assigned a Clssification od either Good, Bad or Indeterminiate, in the same file I have Application Scores for these customers (i.e. each has a score assigned to them from 0 to 100). I want to test the reliability of these scores in terms of the classification I did for these customers (i.e. to be sure that more bads are at lower scores and goods at higher scores.). Could somebody help me with the code that I could use to get a lift curve and/or K-S Curve, Gini, ROC etc or analysis of cumulative goods vs Bad.
Sample
Application Score Score Range Classification
10 0-10 Bad
30 21-30 Bad
68 61-70 Good
12 11-20 Good
Also, is there a way to determinethe cut-off that I can come up with for the application score that I could use to accept or reject cistomer (maybe a reverse cumulative distribution for the bad)?
I have tried a lot to try to get codes for SAS but unsuccessful, please HELP!
You should try decision tree procedure HPSPLIT. Something like:
proc hpsplit data=test;
target class;
input score / level=int;
output nodestats=want;
run;
option linesize=120;
proc print data=want label noobs;
where depth=1;
var leaf n predictedvalue insplitvar decision p_: ;
run;
You will get optimal cutting scores between your classes as well as classification rates.
Hi PG,
Thanks for the response I tried the code but SAS log returns an error message of -
"ERROR: Procedure HPSPLIT not found."
This was a similar situation for PROC Reliability as well, would you know why this is happening?
I have SAS 9.2 and no Enterprise Miner.
Thanks
Gavin
HPSPLIT is rather recent. The first mention of HPSPLIT in the documentation is for version 12.3 of SAS/STAT. If you have access to JMP you could do roughly the same thing with the partition platform.
I do not have access to JMP, is there a way of doing this on SAS 9.2
Appreciate the help.
Thanks
Gavin
Untested, but try these ideas:
Recode the response variable as Bad= -1, indeterminant=0, and good=1. You can fit the response by using the "score" as the explanatory variable for ordinal logistic regression.
The ROC statement in PROC LOGISTIC enables you to construct ROC curves for the response in terms of the scores.
Use the LINK=CLOGLOG option to fit the ordinal response.
The "response profile" table gives various concordance statistics such as Gini and the area under the ROC curve.
In last resort, you could try discriminant analysis, the non-parametric version:
/* Scores over the range of possible values */
data testvalues;
do score = 0 to 100 by 0.1;
output;
end;
run;
/* non-parametric discriminant analysis */
proc discrim data=test method=npar kernel=normal r=5 testdata=testvalues testout=testScore;
class class;
var score;
run;
/* Get the predicted score range for each class */
proc sql;
select _into_ as class, min(score) as fromScore, max(score) as toScore
from testScore
group by _into_
order by fromScore;
quit;
@PGStats I thought about recommending PROC CANCORR, but discriminant analysis is more appropriate for nominal than ordinal categories. What is your reason for recommending the nonparametrix discriminant analysis over the linear?
Hi @Rick_SAS, I suggested non parametric discriminant analysis because I didn't want to make strong assumptions about the score distribution in each class. But more importantly, I thought that using a small kernel would yield sharper delineation of the classes, i.e. class border positions would be determined locally. I chose a normal kernel because of its infinite support.
Thanks to both you guys for the quick turnaround. I really approeciate it.
I had another question on Cumulative Accuracy Profile which I will post shortly. Hope you guys can help.
Regards
Gavin
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.