/*Hi SAS Forum,
I have a dataset like below.
I want to categorize the outsatnding_balance into bands.
*/
data have;
input OS_Balance;
cards;
.
-999.1
-999
-998.7
-500
-100
-75
-50
-25
0
0.00001
;
run;
/*
I have created the below proc format and it does seem correctly doiing the banding.
Q: However, my gut suggets me that the way I have defined the banding is unconventional/awkward.
Could an expert propose an alternative standard way of defining negative bands? */
proc format fmtlib;
value val
. = 'Missing'
low - < -999 = '<-999'
-999 = '-999'
-999 <- -500 = '>-999 - -500'
-500 <- -100 = '>-500 - -100'
-100 <- -75 = '>-100 - -75'
-75 <- -50 = '>-75 - -50'
-50 <- -25 = '>-50 - -25'
-25 <- <0 = '>-25 - <0'
0 = '0'
0 <- High = '>0'
;
run;
data want_1;
set have;
variable= put (OS_Balance, val.);
run;
proc freq data=want_2;
tables LENDING_UTIL_PERCENTAGE/list missing;
format LENDING_UTIL_PERCENTAGE val.;
run;
/*Thanks
Mirisa*/
The only serious potential concern I see would be display order in any procedure that displays in Formatted order. You may not get an order you like.
Being mathmatical of bent I am not afraid to use different parantheses to indicate open or closed intervals but that isn't too everyones taste and still has sort order issues:
-999 <- -500 = '[-999,-500)' -500 <- -100 = '[-500,-100)' -100 <- -75 = '[-100, -75)' -75 <- -50 = '[ -75 , -50)' -50 <- -25 = '[ -50, -25)' -25 <- <0 = '[ -25, 0)'
a [ or ] indicates the value on the low or high end of the interval is included, ( or ) it is excluded
I don't see any problem with the way you defined your buckets, but there is no need to create a new variable or dataset. e.g.:
data have; input OS_Balance; cards; . -999.1 -999 -998.7 -500 -100 -75 -50 -25 -0 0 0.00001 ; run; proc format /*fmtlib*/; value val value val . = 'Missing' low - < -999 = '<-999' -999 = '-999' -999 <- -500 = '>-999 - -500' -500 <- -100 = '>-500 - -100' -100 <- -75 = '>-100 - -75' -75 <- -50 = '>-75 - -50' -50 <- -25 = '>-50 - -25' -25 <- <0 = '>-25 - <0' 0 = '0' 0 <- High = '>0' ; run; proc freq data=have; tables OS_Balance/list missing; format OS_Balance val.; run;
Art, CEO, AnalystFinder.com
The only serious potential concern I see would be display order in any procedure that displays in Formatted order. You may not get an order you like.
Being mathmatical of bent I am not afraid to use different parantheses to indicate open or closed intervals but that isn't too everyones taste and still has sort order issues:
-999 <- -500 = '[-999,-500)' -500 <- -100 = '[-500,-100)' -100 <- -75 = '[-100, -75)' -75 <- -50 = '[ -75 , -50)' -50 <- -25 = '[ -50, -25)' -25 <- <0 = '[ -25, 0)'
a [ or ] indicates the value on the low or high end of the interval is included, ( or ) it is excluded
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.