/*Hi SAS Forum,
I have a dataset like below.
I want to categorize the outsatnding_balance into bands.
*/
data have;
input OS_Balance;
cards;
.
-999.1
-999
-998.7
-500
-100
-75
-50
-25
0
0.00001
;
run;
/*
I have created the below proc format and it does seem correctly doiing the banding.
Q: However, my gut suggets me that the way I have defined the banding is unconventional/awkward.
Could an expert propose an alternative standard way of defining negative bands? */
proc format fmtlib;
value val
. = 'Missing'
low - < -999 = '<-999'
-999 = '-999'
-999 <- -500 = '>-999 - -500'
-500 <- -100 = '>-500 - -100'
-100 <- -75 = '>-100 - -75'
-75 <- -50 = '>-75 - -50'
-50 <- -25 = '>-50 - -25'
-25 <- <0 = '>-25 - <0'
0 = '0'
0 <- High = '>0'
;
run;
data want_1;
set have;
variable= put (OS_Balance, val.);
run;
proc freq data=want_2;
tables LENDING_UTIL_PERCENTAGE/list missing;
format LENDING_UTIL_PERCENTAGE val.;
run;
/*Thanks
Mirisa*/
The only serious potential concern I see would be display order in any procedure that displays in Formatted order. You may not get an order you like.
Being mathmatical of bent I am not afraid to use different parantheses to indicate open or closed intervals but that isn't too everyones taste and still has sort order issues:
-999 <- -500 = '[-999,-500)' -500 <- -100 = '[-500,-100)' -100 <- -75 = '[-100, -75)' -75 <- -50 = '[ -75 , -50)' -50 <- -25 = '[ -50, -25)' -25 <- <0 = '[ -25, 0)'
a [ or ] indicates the value on the low or high end of the interval is included, ( or ) it is excluded
I don't see any problem with the way you defined your buckets, but there is no need to create a new variable or dataset. e.g.:
data have; input OS_Balance; cards; . -999.1 -999 -998.7 -500 -100 -75 -50 -25 -0 0 0.00001 ; run; proc format /*fmtlib*/; value val value val . = 'Missing' low - < -999 = '<-999' -999 = '-999' -999 <- -500 = '>-999 - -500' -500 <- -100 = '>-500 - -100' -100 <- -75 = '>-100 - -75' -75 <- -50 = '>-75 - -50' -50 <- -25 = '>-50 - -25' -25 <- <0 = '>-25 - <0' 0 = '0' 0 <- High = '>0' ; run; proc freq data=have; tables OS_Balance/list missing; format OS_Balance val.; run;
Art, CEO, AnalystFinder.com
The only serious potential concern I see would be display order in any procedure that displays in Formatted order. You may not get an order you like.
Being mathmatical of bent I am not afraid to use different parantheses to indicate open or closed intervals but that isn't too everyones taste and still has sort order issues:
-999 <- -500 = '[-999,-500)' -500 <- -100 = '[-500,-100)' -100 <- -75 = '[-100, -75)' -75 <- -50 = '[ -75 , -50)' -50 <- -25 = '[ -50, -25)' -25 <- <0 = '[ -25, 0)'
a [ or ] indicates the value on the low or high end of the interval is included, ( or ) it is excluded
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.