BookmarkSubscribeRSS Feed
znhnm
Quartz | Level 8

How to create a new grouping column based on the values on another column in a dataset?

 

For example, I have a column called "reason" with 40 distinct values. I want to create "grouping" column with 3 different values based on "reason" values. For missing values of "reason" column new column "grouping" will have "missing value".  what is the best syntax for this preferably without using any and/or statements?

Thank you! 

1 REPLY 1
ballardw
Super User

Quite often  a custom format is a better way than creating new variables. A group created by formatting values of a single variable will be honored by reporting and analysis procedures and generally will work with graphing.

 

Here is an example of creating a custom variable for an age variable in a data set you can test code against.

Proc format;
value agegroup
low-12 = 'Preteen'
13- 18 = 'Teen'
19 - high= 'Adult'
;

proc freq data=sashelp.class;
   tables age;
   format age agegroup.;
run;

The data set Sashelp.class does not have any ages over 16 so you don't see them in the result.

 

Advantages of a custom format are:

1) You don't need to add variables. In large data sets such may take a noticeable amount of time.

2) If you want to change the group then just modifying or creating another format is much faster than adding yet another variable.

3) If you want to apply the same logic to multiple variables (such as age at enrollment, age at graduation) then you can just by associating those variables with the format.

4) The logic for ranges of values in formats is often easier to write than the possibly horrific number of "if/then" statements that might be needed to create a variable. The character < is used in value list to indicate whether the end value is excluded or not of a range.

5) if the variable is numeric then the order of display will generally follow the underlying numeric values. If you create character variables then quite often the default order of appearance in results will be alphabetic.

 

Note: some caveats. If you variable(s) are character the format name on the value statement in proc format needs to include a $ and used with a $. Character values should be included in quotes.

Multiple specific values, whether numeric or character are separated by commas.

You cannot assign the same value to different results.

There are some options, such as OTHER available to say "any value not specifically listed is displayed as" the shown value.

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 385 views
  • 2 likes
  • 2 in conversation