PROC RANK - Missing range for Score, Number of Actual Buckets

ggfggrr · Posted 02-07-2019 05:44 AM

I use the following PROC RANK procedure to convert my credit scorecard points into 10 Buckets.

proc rank data = work.have
	descending
	groups=10
	ties=mean
	out=work.rnk_output(label="rank analysis");
	var scorecard_points;
ranks rank_scorecard_points ;
run; quit;

proc means data=work.rnk_output noprint;
class rank_scorecard_points;
var scorecard_points;
output out=report
n=n
min(scorecard_points)=score_min max(scorecard_points)=score_max;
run;


proc print data=report noobs;
var rank_scorecard_points n score_min score_max;
format n comma6.;
run;

The results are as below;

rank_SCORECARD_POINTS	_TYPE_	_FREQ_	n	score_min	score_max
	0	151135	151135	285	672
0	1	13052	13052	672	672
1	1	16655	16655	647	655
2	1	22034	22034	622	630
3	1	5363	5363	575	597
4	1	30581	30581	574	574
5	1	1911	1911	549	556
6	1	19661	19661	498	542
7	1	12504	12504	458	495
8	1	11253	11253	382	442
9	1	18121	18121	285	361

I have the following 2 issues.

1. I actually want to have 10 buckets. And I specified Groups=10. However, how can I handle the values when the score is higher than the highest score of the last bucket (for 700 , there is no bucket). Does it mean, I should ideally have an additional bucket (total 11 buckets) to handle the values which are higher than 672? .How can I handle this issue so actually I end up having 10 total buckets covering also the values from 672 to 999. Can I specify Groups=10 and add a Bucket manually? Does it makes sense?

2. Also, how can I assign the bucket for the score which is 370 for an example. Because, the PROC RANK did not assign any bucket for the value which falls in between 361 and 382. The same scenario for many small ranges in which PROC RANK don’t assign any buckets.

Kindly note that, I am doing the above exercise to know the Minimum and Maximum range for each Bucket. This will allow me to calculate the Population stability index across several samples (for Monitoring purposes).

Any Help on this is highly appreciate.

Thanks so much.

Kind regards,

Astounding · Posted 02-07-2019 07:40 AM

There is not an automated solution. You need to decide on an acceptable approach. I always used the bucket minimums. In this case

Bucket 1 = 672 or higher
Bucket 2 = below 672 and at least 647
Bucket 10 = below 382

ballardw · Posted 02-07-2019 10:40 AM

Question you need to decide on: Do you want all the possible values in a range consistently assigned to the same "bucket" or do you want the values in the input dataset to have different bucket limits when the data changes?

The results you show indicate that your original data did not have some of the numbers you are asking about. So why are concerned about a process that only creates limits based on available data? If you have data outside of the ranges shown to work with in the future then you do not want to have Proc rank assign the bucket for different data sets and a different method is needed.

I might suggest combining all of you data to get an idea about likely limits for your combined data and use that information to create a format that will assign consistent groups for the specific value of the score. Something like:

Proc format library=work;
value score
670 - high= ' 1'
645 - 669 = ' 2'
620 - 644 = ' 3'
/* pick boundaries for other groups*/
low -360  = '10'
;
run;

or perhaps an informat to create a numeric value.

In either case the format can be applied to a variable to create groups during analysis in almost any procedure and would have the same boundaries for the buckets consistently NOT dependent on one sample of data. Another advantage to using a format is that if you want to consider something with 5 or 3 groups you create the new format and use it directly in the analysis. No additional pass through a procedure like proc rank or a data step to add variables/values. Just add statement such as FORMAT scorecard_points formatname. ; to use the new format.

PROC RANK - Missing range for Score, Number of Actual Buckets

Re: PROC RANK - Missing range for Score, Number of Actual Buckets

Re: PROC RANK - Missing range for Score, Number of Actual Buckets

PROC RANK - Missing range for Score, Number of Actual Buckets

Re: PROC RANK - Missing range for Score, Number of Actual Buckets

Re: PROC RANK - Missing range for Score, Number of Actual Buckets

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!