BookmarkSubscribeRSS Feed
mithusaini
Calcite | Level 5

Hi

 

I have a data set of around 80K customers. I have classified these customers as GOOD, BAD or INDETERMINATE based on their payment history for the Last 12 months.

 

Each customer is assigned a Clssification od either Good, Bad or Indeterminiate, in the same file I have Application Scores for these customers (i.e. each has a score assigned to them from 0 to 100). I want to test the reliability of these scores in terms of the classification I did for these customers (i.e. to be sure that more bads are at lower scores and goods at higher scores.). Could somebody help me with the code that I could use to get a lift curve and/or K-S Curve, Gini, ROC etc or analysis of cumulative goods vs Bad.

 

Sample

 

Application Score   Score Range  Classification 

 10                            0-10                Bad

 30                            21-30              Bad

 68                            61-70              Good

 12                            11-20              Good

 

Also, is there a way to determinethe cut-off that I can come up with for the application score that I could use to accept or reject cistomer (maybe a reverse cumulative distribution for the bad)?

 

I have tried a lot to try to get codes for SAS but unsuccessful, please HELP!

9 REPLIES 9
PGStats
Opal | Level 21

You should try decision tree procedure HPSPLIT. Something like:

 

proc hpsplit data=test;
target class;
input score / level=int;
output nodestats=want;
run;

option linesize=120;
proc print data=want label noobs; 
where depth=1; 
var leaf n predictedvalue insplitvar decision p_: ; 
run;

 

You will get optimal cutting scores between your classes as well as classification rates.

PG
mithusaini
Calcite | Level 5

Hi PG,

 

Thanks for the response I tried the code but SAS log returns an error message of -

 

"ERROR: Procedure HPSPLIT not found."

 

This was a similar situation for PROC Reliability as well, would you know why this is happening?

 

I have SAS 9.2 and no Enterprise Miner.

 

Thanks

Gavin

PGStats
Opal | Level 21

HPSPLIT is rather recent. The first mention of HPSPLIT in the documentation is for version 12.3 of SAS/STAT. If you have access to JMP you could do roughly the same thing with the partition platform.

PG
mithusaini
Calcite | Level 5

I do not have access to JMP, is there a way of doing this on SAS 9.2

 

Appreciate the help.

 

Thanks

Gavin

Rick_SAS
SAS Super FREQ

Untested, but try these ideas:

 

Recode the response variable as Bad= -1, indeterminant=0, and good=1.  You can fit the response by using the "score" as the explanatory variable for ordinal logistic regression.

 

The ROC statement in PROC LOGISTIC enables you to construct ROC curves for the response in terms of the scores.

Use the LINK=CLOGLOG option to fit the ordinal response.

 

The "response profile" table gives various concordance statistics such as Gini and the area under the ROC curve. 

PGStats
Opal | Level 21

In last resort, you could try discriminant analysis, the non-parametric version:

 


/* Scores over the range of possible values */
data testvalues;
do score = 0 to 100 by 0.1;
    output;
    end;
run;

/* non-parametric discriminant analysis */
proc discrim data=test method=npar kernel=normal r=5 testdata=testvalues testout=testScore;
class class;
var score;
run;

/* Get the predicted score range for each class */
proc sql;
select _into_ as class, min(score) as fromScore, max(score) as toScore
from testScore
group by _into_
order by fromScore;
quit;
PG
Rick_SAS
SAS Super FREQ

@PGStats I thought about recommending PROC CANCORR, but discriminant analysis is more appropriate for nominal than ordinal categories. What is your reason for recommending the nonparametrix discriminant analysis over the linear?

PGStats
Opal | Level 21

Hi @Rick_SAS, I suggested non parametric discriminant analysis because I didn't want to make strong assumptions about the score distribution in each class. But more importantly, I thought that using a small kernel would yield sharper delineation of the classes, i.e. class border positions would be determined locally. I chose a normal kernel because of its infinite support.

PG
mithusaini
Calcite | Level 5

Thanks to both you guys for the quick turnaround. I really approeciate it.

 

I had another question on Cumulative Accuracy Profile which I will post shortly. Hope you guys can help.

 

Regards

Gavin

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2009 views
  • 2 likes
  • 3 in conversation