BookmarkSubscribeRSS Feed
ggfggrr
Obsidian | Level 7

I use the following PROC RANK procedure to convert my credit scorecard points into 10 Buckets.

proc rank data = work.have
	descending
	groups=10
	ties=mean
	out=work.rnk_output(label="rank analysis");
	var scorecard_points;
ranks rank_scorecard_points ;
run; quit;

proc means data=work.rnk_output noprint;
class rank_scorecard_points;
var scorecard_points;
output out=report
n=n
min(scorecard_points)=score_min max(scorecard_points)=score_max;
run;


proc print data=report noobs;
var rank_scorecard_points n score_min score_max;
format n comma6.;
run;

The results are as below;

 

rank_SCORECARD_POINTS_TYPE__FREQ_nscore_minscore_max
 0151135151135285672
011305213052672672
111665516655647655
212203422034622630
3153635363575597
413058130581574574
5119111911549556
611966119661498542
711250412504458495
811125311253382442
911812118121285361

 

I have the following 2 issues.

 

1. I actually want to have 10 buckets. And I specified Groups=10. However, how can I handle the values when the score is higher than the highest score of the last bucket (for 700 , there is no bucket).  Does it mean, I should ideally have an additional bucket (total 11 buckets) to handle the values which are higher than 672? .How can I handle this issue so actually I end up having 10 total buckets covering also the values from 672 to 999. Can I specify Groups=10 and add a Bucket manually? Does it makes sense?

 

2. Also, how can I assign the bucket for the score which is 370 for an example. Because, the PROC RANK did not assign any bucket for the value which falls in between 361 and 382. The same scenario for many small ranges in which PROC RANK don’t assign any buckets.

Kindly note that, I am doing the above exercise to know the Minimum and Maximum range for each Bucket. This will allow me to calculate the Population stability index across several samples (for Monitoring purposes).

 

Any Help on this is highly appreciate.


Thanks so much.

 

Kind regards,

 

2 REPLIES 2
Astounding
PROC Star

There is not an automated solution. You need to decide on an acceptable approach. I always used the bucket minimums. In this case

Bucket 1 = 672 or higher
Bucket 2 = below 672 and at least 647
Bucket 10 = below 382

ballardw
Super User

Question you need to decide on: Do you want all the possible values in a range consistently assigned to the same "bucket" or do you want the values in the input dataset to have different bucket limits when the data changes?

 

The results you show indicate that your original data did not have some of the numbers you are asking about. So why are concerned about a process that only creates limits based on available data? If you have data outside of the ranges shown to work with in the future then you do not want to have Proc rank assign the bucket for different data sets and a different method is needed.

 

I might suggest combining all of you data to get an idea about likely limits for your combined data and use that information to create a format that will assign consistent groups for the specific value of the score. Something like:

 

Proc format library=work;
value score
670 - high= ' 1'
645 - 669 = ' 2'
620 - 644 = ' 3'
/* pick boundaries for other groups*/
low -360  = '10'
;
run;

or perhaps an informat to create a numeric value.

In either case the format can be applied to a variable to create groups during analysis in almost any procedure and would have the same boundaries for the buckets consistently NOT dependent on one sample of data. Another advantage to using a format is that if you want to consider something with 5 or 3 groups you create the new format and use it directly in the analysis. No additional pass through a procedure like proc rank or a data step to add variables/values. Just add statement such as FORMAT scorecard_points formatname. ; to use the new format.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 879 views
  • 0 likes
  • 3 in conversation