I used PROC RANK to convert all my observations into 10 buckets (decile) using my application scores using the following procedure and the reference.
proc rank data = work.have
descending
groups=10
ties=mean
out=work.rnk_output(label="rank analysis");
var scorecard_points;
ranks rank_scorecard_points ;
run; quit;
proc means data=work.rnk_output noprint;
class rank_scorecard_points;
var scorecard_points;
output out=report
n=n
min(scorecard_points)=score_min max(scorecard_points)=score_max;
run;
proc print data=report noobs;
var rank_scorecard_points n score_min score_max;
format n comma6.;
run;
However, when I looked at the results for all the groups, I see there is no continuation and some range is missing in the scorecard points as below; (Reference used: https://www.lexjansen.com/nesug/nesug09/ap/AP01.pdf)
We can see from below that, there is no Bucket for 361 to 382 (between Bucket 8 and 9).
rank_SCORECARD_POINTS | _TYPE_ | _FREQ_ | n | score_min | score_max |
0 | 151135 | 151135 | 250 | 672 | |
0 | 1 | 13052 | 13052 | 672 | 672 |
1 | 1 | 16655 | 16655 | 647 | 655 |
2 | 1 | 22034 | 22034 | 622 | 630 |
3 | 1 | 5363 | 5363 | 575 | 597 |
4 | 1 | 30581 | 30581 | 574 | 574 |
5 | 1 | 1911 | 1911 | 549 | 556 |
6 | 1 | 19661 | 19661 | 498 | 542 |
7 | 1 | 12504 | 12504 | 458 | 495 |
8 | 1 | 11253 | 11253 | 382 | 442 |
9 | 1 | 18121 | 18121 | 250 | 361 |
Can someone help me why the results are not consistent and how can I solve this for Credit scorecard. This is to report as part of monitoring purpose and in specific in reporting the population stability index.
Thanks a lot for taking time.
Proc rank puts every one of your observations into some decile (or any other group cardinality), but it does NOT determine cut points between deciles. So natural gaps in your data may fall between adjacent deciles. If you don't need to assign other observations to deciles, it's not a problem.
But if you really do need cutpoints, you could use proc univariate, using the PCTLPTS (and PCTLPRE) options on the output statement:
proc univariate data=have ;
var scorecard_points;
output out=cutpts pctlpre=SP_ pctlpts = 10 to 90 by 10;
run;
This makes a single observation with variables SP_10, SP_20, SP_30, ...., SP_90 which you can use to process data set HAVE or any other dataset to which these cutpoints are applicable.
Or, you could take the gaps apparent in he MINs and MAXs in your ranked data, and apply some rule (mid-point, for example) to generate cut-points from the proc rank output.
Are you sure that you have data in that range? Use a WHERE clause to see:
proc means data=work.rnk_output;
where 361 <= scorecard_points and scorecard_points <= 382;
var scorecard_points;
run;
Thanks so much for your help.
Kind regards,
Proc rank puts every one of your observations into some decile (or any other group cardinality), but it does NOT determine cut points between deciles. So natural gaps in your data may fall between adjacent deciles. If you don't need to assign other observations to deciles, it's not a problem.
But if you really do need cutpoints, you could use proc univariate, using the PCTLPTS (and PCTLPRE) options on the output statement:
proc univariate data=have ;
var scorecard_points;
output out=cutpts pctlpre=SP_ pctlpts = 10 to 90 by 10;
run;
This makes a single observation with variables SP_10, SP_20, SP_30, ...., SP_90 which you can use to process data set HAVE or any other dataset to which these cutpoints are applicable.
Or, you could take the gaps apparent in he MINs and MAXs in your ranked data, and apply some rule (mid-point, for example) to generate cut-points from the proc rank output.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.