Hi All,
I have a data set with ages ranging from 39.9 to 87.6 and want to combine these into categories (30-39, 40-49, etc.) I wrote the following code:
proc format;
value age_range_group
30 = '30 to 39'
40 = '40 to 49'
50 = '50 to 59'
60 = '60 to 69'
70 = '70 to 79'
80 = '80 to 89'
other = 'missing'; *not sure this line works?;
run;
data temp1;
set temp;
where age_P1;
if age_P1 <=39.9 then age_range_group = 30;
if age_P1 >=40 and age_P1<50 then age_range_group=40;
if age_P1 >=50 and age_P1<60 then age_range_group=50;
if age_P1 >=60 and age_P1<70 then age_range_group=60;
if age_P1 >=70 and age_P1<80 then age_range_group=70;
if age_P1 >=80 and age_P1<90 then age_range_group=80;
format age_P1 age_range_group.;
run;
proc freq data=temp1;
tables age_range_group;
run;
And get the following output. It does not retain my Label.
I then tried adding a Proc print data=temp1 label; (before the proc freq code above) but SAS says the file is too big and won't output.
Any suggestions are greatly appreciated.
Your created a variable named age_range_group, but the FORMAT statement applied your format to the variable age_P1.
Instead of:
format age_P1 age_range_group.;
Try:
format age_range_group age_range_group.;
But you can make this easier. The format can actually do the grouping for you. So you can make a format like:
proc format;
value age_range_group
30 - <40 = '30 to 39'
40 - <50 = '40 to 49'
50 - <60 = '50 to 59'
60 - <70 = '60 to 69'
70 - <80 = '70 to 79'
80 - <90 = '80 to 89'
other = 'missing'
;
run;
And then you could use that format on your Age_P1 variable, e.g.:
data have ;
input age_P1 ;
format age_P1 age_range_group. ;
cards ;
35.1
71.2
.
;
run ;
proc freq data=have ;
tables age_P1 /missing;
run ;
returns:
The FREQ Procedure Cumulative Cumulative age_P1 Frequency Percent Frequency Percent missing 1 33.33 1 33.33 30 to 39 1 33.33 2 66.67 70 to 79 1 33.33 3 100.00
Your created a variable named age_range_group, but the FORMAT statement applied your format to the variable age_P1.
Instead of:
format age_P1 age_range_group.;
Try:
format age_range_group age_range_group.;
But you can make this easier. The format can actually do the grouping for you. So you can make a format like:
proc format;
value age_range_group
30 - <40 = '30 to 39'
40 - <50 = '40 to 49'
50 - <60 = '50 to 59'
60 - <70 = '60 to 69'
70 - <80 = '70 to 79'
80 - <90 = '80 to 89'
other = 'missing'
;
run;
And then you could use that format on your Age_P1 variable, e.g.:
data have ;
input age_P1 ;
format age_P1 age_range_group. ;
cards ;
35.1
71.2
.
;
run ;
proc freq data=have ;
tables age_P1 /missing;
run ;
returns:
The FREQ Procedure Cumulative Cumulative age_P1 Frequency Percent Frequency Percent missing 1 33.33 1 33.33 30 to 39 1 33.33 2 66.67 70 to 79 1 33.33 3 100.00
Thank you, @Quentin. Your code is a lot easier however, I'm curious what happens to the age_range_group variable that I created? I thought the format age_P1 age_range_group.; code is telling sas to make the variable age_P1 into the new variable age_range_group. Not so?
Other than missing that new variable name, it works perfectly. Thank you!
Code/output below:
proc format;
value age_range_group
30 - <40 = '30 to 39'
40 - <50 = '40 to 49'
50 - <60 = '50 to 59'
60 - <70 = '60 to 69'
70 - <80 = '70 to 79'
80 - <90 = '80 to 89'
other = 'missing';
run;
data temp1;
set temp;
format age_P1 age_range_group.;
run;
proc freq data=temp1;
tables age_P1/missing;
run;
A format does not create a new variable, it only influences how the values of an existing variable are displayed, or grouped with procedures like FREQ.
One way to think about FORMATS is as "Value Labels". Whenever you display the value of of variable (e.g. using PROC FREQ), SAS needs to decide how to display it. With numeric values, SAS will guess a reasonable way to display the value, based on the size of the number. But sometimes you don't want to display the number. You want to display a label. Maybe it's because you have have a numeric variable where 0 means "No" and 1 means "Yes", and you want to display "No" or "Yes". Or as in your case, maybe it's because you want to display the label for a group of values, instead of the values themselves.
The format itself isn't a variable, and doesn't create a variable. It just creates a simple look-up table where values on the left are matched to labels on the right. Below code will create two different formats, which group age in different ways:
proc format;
value age_range_group
30 - <40 = '30 to 39'
40 - <50 = '40 to 49'
50 - <60 = '50 to 59'
60 - <70 = '60 to 69'
70 - <80 = '70 to 79'
80 - <90 = '80 to 89'
other = 'missing'
;
value lifestage
1 - <18 = 'Child'
18- <65 = 'Adult'
65- 100 = 'Senior'
;
run;
With the same input data (note this time I do not have a FORMAT statement in the DATA step code):
data have ;
input age_P1 ;
cards ;
35.1
71.2
.
;
run ;
You can run PROC FREQ, and decide whether to use the default format to display values (which will show "35.1" and "75.1") or you can use one of the custom formats you created. So the format statement is saying "please use this format to display the values." The format itself is just a look up table that links values to value labels.
proc freq data=have ;
tables age_P1 ;
run ;
proc freq data=have ;
tables age_P1 ;
format age_P1 age_range_group. ;
run ;
proc freq data=have ;
tables age_P1 ;
format age_P1 lifestage. ;
run ;
@kristiepauly wrote:
Thank you, @Quentin. Your code is a lot easier however, I'm curious what happens to the age_range_group variable that I created? I thought the format age_P1 age_range_group.; code is telling sas to make the variable age_P1 into the new variable age_range_group. Not so?
Other than missing that new variable name, it works perfectly. Thank you!
Code/output below:
proc format; value age_range_group 30 - <40 = '30 to 39' 40 - <50 = '40 to 49' 50 - <60 = '50 to 59' 60 - <70 = '60 to 69' 70 - <80 = '70 to 79' 80 - <90 = '80 to 89' other = 'missing'; run; data temp1; set temp; format age_P1 age_range_group.; run; proc freq data=temp1; tables age_P1/missing; run;
If you don't want the column heading to read Age_P1, you can assign a label to the variable Age_P1. In PROC FREQ add
label age_p1='Age at Current Visit';
or
label age_p1='Age Range Group';
Please note when creating labels, do your audience a favor and create labels that are actual English, with proper capitalization, and not a computer programming language variable name such as age_range_group.
I thought the format age_P1 age_range_group.; code is telling sas to make the variable age_P1 into the new variable age_range_group.
Formats are instructions for how to convert values into text. The FORMAT statement just attaches a format to a variable so it will be used to display the values of the variable.
If you want to use a format to make a new variable then use a normal ASSIGNMENT statement and use the PUT() function to apply the format to the variable.
So to make new variable named age_range_group from the existing variable named age_P1 using the format named age_range_group you would use a statement like this:
age_range_group = put(age_P1,age_range_group.);
AGE_RANGE_GROUP will be a CHARACTER variable since formats always generate text from values.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.