Solved: SAS PROC RANK - Missing Score range

ggfggrr · Posted 02-07-2019 04:49 AM

I used PROC RANK to convert all my observations into 10 buckets (decile) using my application scores using the following procedure and the reference.

proc rank data = work.have
	descending
	groups=10
	ties=mean
	out=work.rnk_output(label="rank analysis");
	var scorecard_points;
ranks rank_scorecard_points ;
run; quit;

proc means data=work.rnk_output noprint;
class rank_scorecard_points;
var scorecard_points;
output out=report
n=n
min(scorecard_points)=score_min max(scorecard_points)=score_max;
run;

proc print data=report noobs;
var rank_scorecard_points n score_min score_max;
format n comma6.;
run;

However, when I looked at the results for all the groups, I see there is no continuation and some range is missing in the scorecard points as below; (Reference used: https://www.lexjansen.com/nesug/nesug09/ap/AP01.pdf)

We can see from below that, there is no Bucket for 361 to 382 (between Bucket 8 and 9).

rank_SCORECARD_POINTS	_TYPE_	_FREQ_	n	score_min	score_max
	0	151135	151135	250	672
0	1	13052	13052	672	672
1	1	16655	16655	647	655
2	1	22034	22034	622	630
3	1	5363	5363	575	597
4	1	30581	30581	574	574
5	1	1911	1911	549	556
6	1	19661	19661	498	542
7	1	12504	12504	458	495
8	1	11253	11253	382	442
9	1	18121	18121	250	361

Can someone help me why the results are not consistent and how can I solve this for Credit scorecard. This is to report as part of monitoring purpose and in specific in reporting the population stability index.

Thanks a lot for taking time.

mkeintz · Posted 02-11-2019 10:11 PM

@ggfggrr

Proc rank puts every one of your observations into some decile (or any other group cardinality), but it does NOT determine cut points between deciles. So natural gaps in your data may fall between adjacent deciles. If you don't need to assign other observations to deciles, it's not a problem.

But if you really do need cutpoints, you could use proc univariate, using the PCTLPTS (and PCTLPRE) options on the output statement:

proc univariate data=have ;
 var scorecard_points;
 output out=cutpts pctlpre=SP_ pctlpts = 10 to 90 by 10;
run;

This makes a single observation with variables SP_10, SP_20, SP_30, ...., SP_90 which you can use to process data set HAVE or any other dataset to which these cutpoints are applicable.

Or, you could take the gaps apparent in he MINs and MAXs in your ranked data, and apply some rule (mid-point, for example) to generate cut-points from the proc rank output.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

Rick_SAS · Posted 02-11-2019 09:23 AM

Are you sure that you have data in that range? Use a WHERE clause to see:

proc means data=work.rnk_output;
where 361 <= scorecard_points and scorecard_points <= 382;
var scorecard_points;
run;

ggfggrr · Posted 02-18-2019 08:06 AM

Thanks so much for your help.

Kind regards,

mkeintz · Posted 02-11-2019 10:11 PM

@ggfggrr

Proc rank puts every one of your observations into some decile (or any other group cardinality), but it does NOT determine cut points between deciles. So natural gaps in your data may fall between adjacent deciles. If you don't need to assign other observations to deciles, it's not a problem.

But if you really do need cutpoints, you could use proc univariate, using the PCTLPTS (and PCTLPRE) options on the output statement:

proc univariate data=have ;
 var scorecard_points;
 output out=cutpts pctlpre=SP_ pctlpts = 10 to 90 by 10;
run;

This makes a single observation with variables SP_10, SP_20, SP_30, ...., SP_90 which you can use to process data set HAVE or any other dataset to which these cutpoints are applicable.

Or, you could take the gaps apparent in he MINs and MAXs in your ranked data, and apply some rule (mid-point, for example) to generate cut-points from the proc rank output.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS PROC RANK - Missing Score range

Re: SAS PROC RANK - Missing Score range

Re: SAS PROC RANK - Missing Score range

Re: SAS PROC RANK - Missing Score range

Re: SAS PROC RANK - Missing Score range

SAS PROC RANK - Missing Score range

Re: SAS PROC RANK - Missing Score range

Re: SAS PROC RANK - Missing Score range

Re: SAS PROC RANK - Missing Score range

Re: SAS PROC RANK - Missing Score range

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!