Programming the statistical procedures from SAS

Cululative Curves

Reply
Occasional Contributor
Posts: 5

Cululative Curves

Hi

 

I have a data set of around 80K customers. I have classified these customers as GOOD, BAD or INDETERMINATE based on their payment history for the Last 12 months.

 

Each customer is assigned a Clssification od either Good, Bad or Indeterminiate, in the same file I have Application Scores for these customers (i.e. each has a score assigned to them from 0 to 100). I want to test the reliability of these scores in terms of the classification I did for these customers (i.e. to be sure that more bads are at lower scores and goods at higher scores.). Could somebody help me with the code that I could use to get a lift curve and/or K-S Curve, Gini, ROC etc or analysis of cumulative goods vs Bad.

 

Sample

 

Application Score   Score Range  Classification 

 10                            0-10                Bad

 30                            21-30              Bad

 68                            61-70              Good

 12                            11-20              Good

 

Also, is there a way to determinethe cut-off that I can come up with for the application score that I could use to accept or reject cistomer (maybe a reverse cumulative distribution for the bad)?

 

I have tried a lot to try to get codes for SAS but unsuccessful, please HELP!

Respected Advisor
Posts: 4,606

Re: Cululative Curves

You should try decision tree procedure HPSPLIT. Something like:

 

proc hpsplit data=test;
target class;
input score / level=int;
output nodestats=want;
run;

option linesize=120;
proc print data=want label noobs; 
where depth=1; 
var leaf n predictedvalue insplitvar decision p_: ; 
run;

 

You will get optimal cutting scores between your classes as well as classification rates.

PG
Occasional Contributor
Posts: 5

Re: Cululative Curves

Hi PG,

 

Thanks for the response I tried the code but SAS log returns an error message of -

 

"ERROR: Procedure HPSPLIT not found."

 

This was a similar situation for PROC Reliability as well, would you know why this is happening?

 

I have SAS 9.2 and no Enterprise Miner.

 

Thanks

Gavin

Respected Advisor
Posts: 4,606

Re: Cululative Curves

HPSPLIT is rather recent. The first mention of HPSPLIT in the documentation is for version 12.3 of SAS/STAT. If you have access to JMP you could do roughly the same thing with the partition platform.

PG
Occasional Contributor
Posts: 5

Re: Cululative Curves

I do not have access to JMP, is there a way of doing this on SAS 9.2

 

Appreciate the help.

 

Thanks

Gavin

SAS Super FREQ
Posts: 3,319

Re: Cululative Curves

Untested, but try these ideas:

 

Recode the response variable as Bad= -1, indeterminant=0, and good=1.  You can fit the response by using the "score" as the explanatory variable for ordinal logistic regression.

 

The ROC statement in PROC LOGISTIC enables you to construct ROC curves for the response in terms of the scores.

Use the LINK=CLOGLOG option to fit the ordinal response.

 

The "response profile" table gives various concordance statistics such as Gini and the area under the ROC curve. 

Respected Advisor
Posts: 4,606

Re: Cululative Curves

In last resort, you could try discriminant analysis, the non-parametric version:

 


/* Scores over the range of possible values */
data testvalues;
do score = 0 to 100 by 0.1;
    output;
    end;
run;

/* non-parametric discriminant analysis */
proc discrim data=test method=npar kernel=normal r=5 testdata=testvalues testout=testScore;
class class;
var score;
run;

/* Get the predicted score range for each class */
proc sql;
select _into_ as class, min(score) as fromScore, max(score) as toScore
from testScore
group by _into_
order by fromScore;
quit;
PG
SAS Super FREQ
Posts: 3,319

Re: Cululative Curves

@PGStats I thought about recommending PROC CANCORR, but discriminant analysis is more appropriate for nominal than ordinal categories. What is your reason for recommending the nonparametrix discriminant analysis over the linear?

Respected Advisor
Posts: 4,606

Re: Cululative Curves

Hi @Rick_SAS, I suggested non parametric discriminant analysis because I didn't want to make strong assumptions about the score distribution in each class. But more importantly, I thought that using a small kernel would yield sharper delineation of the classes, i.e. class border positions would be determined locally. I chose a normal kernel because of its infinite support.

PG
Occasional Contributor
Posts: 5

Re: Cululative Curves

Thanks to both you guys for the quick turnaround. I really approeciate it.

 

I had another question on Cumulative Accuracy Profile which I will post shortly. Hope you guys can help.

 

Regards

Gavin

Ask a Question
Discussion stats
  • 9 replies
  • 403 views
  • 2 likes
  • 3 in conversation