I use the following PROC RANK procedure to convert my credit scorecard points into 10 Buckets.
proc rank data = work.have descending groups=10 ties=mean out=work.rnk_output(label="rank analysis"); var scorecard_points; ranks rank_scorecard_points ; run; quit; proc means data=work.rnk_output noprint; class rank_scorecard_points; var scorecard_points; output out=report n=n min(scorecard_points)=score_min max(scorecard_points)=score_max; run; proc print data=report noobs; var rank_scorecard_points n score_min score_max; format n comma6.; run;
The results are as below;
I have the following 2 issues.
1. I actually want to have 10 buckets. And I specified Groups=10. However, how can I handle the values when the score is higher than the highest score of the last bucket (for 700 , there is no bucket). Does it mean, I should ideally have an additional bucket (total 11 buckets) to handle the values which are higher than 672? .How can I handle this issue so actually I end up having 10 total buckets covering also the values from 672 to 999. Can I specify Groups=10 and add a Bucket manually? Does it makes sense?
2. Also, how can I assign the bucket for the score which is 370 for an example. Because, the PROC RANK did not assign any bucket for the value which falls in between 361 and 382. The same scenario for many small ranges in which PROC RANK don’t assign any buckets.
Kindly note that, I am doing the above exercise to know the Minimum and Maximum range for each Bucket. This will allow me to calculate the Population stability index across several samples (for Monitoring purposes).
Any Help on this is highly appreciate.
Thanks so much.
Question you need to decide on: Do you want all the possible values in a range consistently assigned to the same "bucket" or do you want the values in the input dataset to have different bucket limits when the data changes?
The results you show indicate that your original data did not have some of the numbers you are asking about. So why are concerned about a process that only creates limits based on available data? If you have data outside of the ranges shown to work with in the future then you do not want to have Proc rank assign the bucket for different data sets and a different method is needed.
I might suggest combining all of you data to get an idea about likely limits for your combined data and use that information to create a format that will assign consistent groups for the specific value of the score. Something like:
Proc format library=work; value score 670 - high= ' 1' 645 - 669 = ' 2' 620 - 644 = ' 3' /* pick boundaries for other groups*/ low -360 = '10' ; run;
or perhaps an informat to create a numeric value.
In either case the format can be applied to a variable to create groups during analysis in almost any procedure and would have the same boundaries for the buckets consistently NOT dependent on one sample of data. Another advantage to using a format is that if you want to consider something with 5 or 3 groups you create the new format and use it directly in the analysis. No additional pass through a procedure like proc rank or a data step to add variables/values. Just add statement such as FORMAT scorecard_points formatname. ; to use the new format.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.