I currently have an age variable with the below categories:
<15, 15-17, 18-19
I want to also create an age group for 15-19, since my frequency tables will ultimately need to be stratified the below groups:
<15, 15-17, 18-19, 15-19
Is it possible to use a format or data step that will let me double count the observations that are 15-19 as being both 15-19 and either 15-17 or 18-19? Or does each category in a categorical variable need to be mutually exclusive and assign only 1 value to each observation?
Any suggestions appreciated!
If you intend to create the summaries with Proc Report, Tabulate or Summary/Means and then Print you can use a MULTILABEL format with the numeric variable containing age.
data example; /* make some example data to apply a format */ do age= 13 to 19; do i=1 to rand('uniform',6); output; end; end; keep age; run; proc format; value agegroup (Multilabel) low - <15 = '<15' 15 - 19 = '15-19' 15 - 17 = '15-17' 18 - 19 = '18-19' ; proc means data=example; class age / mlf; format age agegroup.; run; proc tabulate data=example; class age /mlf ; format age agegroup.; table age, n pctn ; run;
There are some interesting interactions between the values of the variable, the order the formatted values are assigned and result order in different procedures and some options. So don't hesitate to swap the order of the formatting ranges in the Value statement.
If you intend to create the summaries with Proc Report, Tabulate or Summary/Means and then Print you can use a MULTILABEL format with the numeric variable containing age.
data example; /* make some example data to apply a format */ do age= 13 to 19; do i=1 to rand('uniform',6); output; end; end; keep age; run; proc format; value agegroup (Multilabel) low - <15 = '<15' 15 - 19 = '15-19' 15 - 17 = '15-17' 18 - 19 = '18-19' ; proc means data=example; class age / mlf; format age agegroup.; run; proc tabulate data=example; class age /mlf ; format age agegroup.; table age, n pctn ; run;
There are some interesting interactions between the values of the variable, the order the formatted values are assigned and result order in different procedures and some options. So don't hesitate to swap the order of the formatting ranges in the Value statement.
Would this option not work with proc freq?
Just run it and see.
proc means data=example;
class age / mlf;
format age agegroup.;
run;
proc freq data=example;
tables age / list;
format age agegroup.;
run;
So NO it does not work with PROC FREQ.
@greesamu wrote:
Would this option not work with proc freq?
There was a reason I specifically listed those procedures. If you are looking for basic reporting of counts and percentages then Proc Tabulate generally will do most reporting that Proc Freq does. That is also why I included a Proc Tabulate example.
The statistical tests that are possible in Proc Freq are likely a major reason the Multilabel formats aren't implemented there as having the additional overlapping categories would violate one or more requirements for many if not all of the tests implemented.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.