BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ggfggrr
Obsidian | Level 7

I used PROC RANK to convert all my observations into 10 buckets (decile) using my application scores using the following procedure and the reference.

 

proc rank data = work.have
	descending
	groups=10
	ties=mean
	out=work.rnk_output(label="rank analysis");
	var scorecard_points;
ranks rank_scorecard_points ;
run; quit;

proc means data=work.rnk_output noprint;
class rank_scorecard_points;
var scorecard_points;
output out=report
n=n
min(scorecard_points)=score_min max(scorecard_points)=score_max;
run;

proc print data=report noobs;
var rank_scorecard_points n score_min score_max;
format n comma6.;
run;

However, when I looked at the results for all the groups, I see there is no continuation and some range is missing in the scorecard points as below; (Reference used: https://www.lexjansen.com/nesug/nesug09/ap/AP01.pdf)

 

We can see from below that, there is no Bucket for 361 to 382 (between Bucket 8 and 9).

 

rank_SCORECARD_POINTS_TYPE__FREQ_nscore_minscore_max
 0151135151135250672
011305213052672672
111665516655647655
212203422034622630
3153635363575597
413058130581574574
5119111911549556
611966119661498542
711250412504458495
811125311253382442
911812118121250361

 

Can someone help me why the results are not consistent and how can I solve this for Credit scorecard. This is to report as part of monitoring purpose and in specific in reporting the population stability index.


Thanks a lot for taking time.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

@ggfggrr 

 

Proc rank puts every one of your observations into some decile (or any other group cardinality), but it does NOT determine cut points between deciles.  So natural gaps in your data may fall between adjacent deciles.  If you don't need to assign other observations to  deciles, it's not a problem.

 

But if you really do need cutpoints, you could use proc univariate, using the PCTLPTS (and PCTLPRE) options on the output statement:

 

proc univariate data=have ;
 var scorecard_points;
 output out=cutpts pctlpre=SP_ pctlpts = 10 to 90 by 10;
run;

This makes a single observation with variables SP_10, SP_20, SP_30, ...., SP_90 which you can use to process data set HAVE or any other dataset to which these cutpoints are applicable.

 

Or, you could take the gaps apparent in he MINs and MAXs in your ranked data, and apply some rule (mid-point, for example) to generate cut-points from the proc rank output.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

3 REPLIES 3
Rick_SAS
SAS Super FREQ

Are you sure that you have data in that range? Use a WHERE clause to see:

 

proc means data=work.rnk_output;
where 361 <= scorecard_points and scorecard_points <= 382;
var scorecard_points;
run;
ggfggrr
Obsidian | Level 7

Thanks so much for your help.

 

Kind regards,

mkeintz
PROC Star

@ggfggrr 

 

Proc rank puts every one of your observations into some decile (or any other group cardinality), but it does NOT determine cut points between deciles.  So natural gaps in your data may fall between adjacent deciles.  If you don't need to assign other observations to  deciles, it's not a problem.

 

But if you really do need cutpoints, you could use proc univariate, using the PCTLPTS (and PCTLPRE) options on the output statement:

 

proc univariate data=have ;
 var scorecard_points;
 output out=cutpts pctlpre=SP_ pctlpts = 10 to 90 by 10;
run;

This makes a single observation with variables SP_10, SP_20, SP_30, ...., SP_90 which you can use to process data set HAVE or any other dataset to which these cutpoints are applicable.

 

Or, you could take the gaps apparent in he MINs and MAXs in your ranked data, and apply some rule (mid-point, for example) to generate cut-points from the proc rank output.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 3848 views
  • 3 likes
  • 3 in conversation