BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dkanand86
Calcite | Level 5

Hi Everyone,

Please can someone let me know the ways in which a continuous variable can be converted into bins (buckets)?

Suggest alternative coding ways either using proc sql, data step or sas macro.

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Big advantages of using the format approach for this:

1) you don't have to add any variables, so no additional passes through the entire data

2) you can assign the format as needed for specific analysis

3) one format can be assigned to multiple variables that use the same rule

4) multiple formats to regroup data as needed, just change the format for the proc at run time.

5) as long as values of only a single variable are considered it can be much easier to construct then if/then/else logic and checks to ensure that one start value is treated the same all the time (exception: multilabel but that's a bit more advanced and only useable for some procs)

6) used with special missing assignments .A - .Z you can get descriptions for what that missing means such as not asked, original value out of bounds (and differentiate between high and low), instrument failure, skip pattern and a plethora of conditions. And the missing values are still excluded from calculations such as means.

View solution in original post

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

And your example data in the form of a datastep, plus required output is?  At a guess:

data have;

  do i=1 to 1000;

    output;

  end;

run;

data want;

  set have;

  bucket=floor(_n_ / 50)+1;

run;

EH
Obsidian | Level 7 EH
Obsidian | Level 7

My preferred way is format.

proc format;

   value age low-17 = '< 18 yrs'

                   18-64 = '18-65 yrs'

                   65-high = '> 65 yrs';

run;

data want;

   set have;

   length agegrp $ 15;

   agegrp = put(age, age.);

run;

Hth,

Eric

cwilson
Calcite | Level 5

I like Proc Format for continuous numeric data.

You can use the LT symbol to get up to a boundary

Proc format ;

     value agegrp

     low - <18 = '0-18'

     18-<25 = '18-25'

     25-<35 = '25-35'

...

     85-high = '85 and over' ;

run ;

Ha!  Eric beat me to it!  Smiley Happy

ballardw
Super User

Big advantages of using the format approach for this:

1) you don't have to add any variables, so no additional passes through the entire data

2) you can assign the format as needed for specific analysis

3) one format can be assigned to multiple variables that use the same rule

4) multiple formats to regroup data as needed, just change the format for the proc at run time.

5) as long as values of only a single variable are considered it can be much easier to construct then if/then/else logic and checks to ensure that one start value is treated the same all the time (exception: multilabel but that's a bit more advanced and only useable for some procs)

6) used with special missing assignments .A - .Z you can get descriptions for what that missing means such as not asked, original value out of bounds (and differentiate between high and low), instrument failure, skip pattern and a plethora of conditions. And the missing values are still excluded from calculations such as means.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 5317 views
  • 1 like
  • 5 in conversation