BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
kristiepauly
Obsidian | Level 7

Hi All, 

I have a data set with ages ranging from 39.9 to 87.6 and want to combine these into categories (30-39, 40-49, etc.) I wrote the following code:

proc format;
value age_range_group
30 = '30 to 39'
40 = '40 to 49'
50 = '50 to 59'
60 = '60 to 69'
70 = '70 to 79'
80 = '80 to 89'
other = 'missing'; *not sure this line works?;
run;

data temp1;
set temp;
where age_P1;
if age_P1 <=39.9 then age_range_group = 30;
if age_P1 >=40 and age_P1<50 then age_range_group=40;
if age_P1 >=50 and age_P1<60 then age_range_group=50;
if age_P1 >=60 and age_P1<70 then age_range_group=60;
if age_P1 >=70 and age_P1<80 then age_range_group=70;
if age_P1 >=80 and age_P1<90 then age_range_group=80;
format age_P1 age_range_group.;
run;

proc freq data=temp1;
tables age_range_group;
run;

 

And get the following output. It does not retain my Label. 

kristiepauly_0-1709411861880.png

I then tried adding a Proc print data=temp1 label; (before the proc freq code above) but SAS says the file is too big and won't output.  

Any suggestions are greatly appreciated. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

Your created a variable named age_range_group, but the FORMAT statement applied your format to the variable age_P1.

 

Instead of:

format age_P1 age_range_group.;

Try:

format age_range_group age_range_group.;

But you can make this easier. The format can actually do the grouping for you.  So you can make a format like:

proc format;
  value age_range_group
    30 - <40 = '30 to 39'
    40 - <50 = '40 to 49'
    50 - <60 = '50 to 59'
    60 - <70 = '60 to 69'
    70 - <80 = '70 to 79'
    80 - <90 = '80 to 89'
    other = 'missing'
  ;
run;

And then you could use that format on your Age_P1 variable, e.g.:

data have ;
  input age_P1 ;
  format age_P1 age_range_group. ;
  cards ;
35.1
71.2
.
;
run ;

proc freq data=have ;
  tables age_P1 /missing;
run ;

returns:

                      The FREQ Procedure

                                     Cumulative    Cumulative
  age_P1    Frequency     Percent     Frequency      Percent
missing            1       33.33             1        33.33
30 to 39           1       33.33             2        66.67
70 to 79           1       33.33             3       100.00

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

6 REPLIES 6
Quentin
Super User

Your created a variable named age_range_group, but the FORMAT statement applied your format to the variable age_P1.

 

Instead of:

format age_P1 age_range_group.;

Try:

format age_range_group age_range_group.;

But you can make this easier. The format can actually do the grouping for you.  So you can make a format like:

proc format;
  value age_range_group
    30 - <40 = '30 to 39'
    40 - <50 = '40 to 49'
    50 - <60 = '50 to 59'
    60 - <70 = '60 to 69'
    70 - <80 = '70 to 79'
    80 - <90 = '80 to 89'
    other = 'missing'
  ;
run;

And then you could use that format on your Age_P1 variable, e.g.:

data have ;
  input age_P1 ;
  format age_P1 age_range_group. ;
  cards ;
35.1
71.2
.
;
run ;

proc freq data=have ;
  tables age_P1 /missing;
run ;

returns:

                      The FREQ Procedure

                                     Cumulative    Cumulative
  age_P1    Frequency     Percent     Frequency      Percent
missing            1       33.33             1        33.33
30 to 39           1       33.33             2        66.67
70 to 79           1       33.33             3       100.00

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
kristiepauly
Obsidian | Level 7

Thank you, @Quentin.  Your code is a lot easier however, I'm curious what happens to the age_range_group variable that I created? I thought the format age_P1 age_range_group.; code is telling sas to make the variable age_P1 into the new variable age_range_group.  Not so? 

Other than missing that new variable name, it works perfectly.  Thank you! 

Code/output below:

proc format;
	value age_range_group
	30 - <40 = '30 to 39'
	40 - <50 = '40 to 49'
	50 - <60 = '50 to 59'
	60 - <70 = '60 to 69'
	70 - <80 = '70 to 79'
	80 - <90 = '80 to 89'
	other = 'missing'; 
run;

data temp1;
	set temp;
	format age_P1 age_range_group.;
	run;

proc freq data=temp1;
	tables age_P1/missing;
	run;

kristiepauly_1-1709415790498.png

 

Quentin
Super User

One way to think about FORMATS is as "Value Labels".   Whenever you display the value of of variable (e.g. using PROC FREQ), SAS needs to decide how to display it.  With numeric values, SAS will guess a reasonable way to display the value, based on the size of the number.  But sometimes you don't want to display the number.  You want to display a label.  Maybe it's because you have have a numeric variable where 0 means "No" and 1 means "Yes", and you want to display "No" or "Yes".  Or as in your case, maybe it's because you want to display the label for a group of values, instead of the values themselves. 

 

The format itself isn't a variable, and doesn't create a variable.  It just creates a simple look-up table where values on the left are matched to labels on the right.  Below code will create two different formats, which group age in different ways:

 

proc format;
  value age_range_group
    30 - <40 = '30 to 39'
    40 - <50 = '40 to 49'
    50 - <60 = '50 to 59'
    60 - <70 = '60 to 69'
    70 - <80 = '70 to 79'
    80 - <90 = '80 to 89'
    other = 'missing'
  ;

  value lifestage
    1 - <18   = 'Child'
    18- <65   = 'Adult'
    65- 100   = 'Senior'
  ;
run;

With the same input data (note this time I do not have a FORMAT statement in the DATA step code):

data have ;
  input age_P1 ;
  cards ;
35.1
71.2
.
;
run ;

You can run PROC FREQ, and decide whether to use the default format to display values (which will show "35.1" and "75.1") or you can use one of the custom formats you created.  So the format statement is saying "please use this format to display the values."  The format itself is just a look up table that links values to value labels.

 

proc freq data=have ;
  tables age_P1 ;
run ;

proc freq data=have ;
  tables age_P1 ;
  format age_P1 age_range_group. ;
run ;

proc freq data=have ;
  tables age_P1 ;
  format age_P1 lifestage. ;
run ;
BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
PaigeMiller
Diamond | Level 26

@kristiepauly wrote:

Thank you, @Quentin.  Your code is a lot easier however, I'm curious what happens to the age_range_group variable that I created? I thought the format age_P1 age_range_group.; code is telling sas to make the variable age_P1 into the new variable age_range_group.  Not so? 

Other than missing that new variable name, it works perfectly.  Thank you! 

Code/output below:

proc format;
	value age_range_group
	30 - <40 = '30 to 39'
	40 - <50 = '40 to 49'
	50 - <60 = '50 to 59'
	60 - <70 = '60 to 69'
	70 - <80 = '70 to 79'
	80 - <90 = '80 to 89'
	other = 'missing'; 
run;

data temp1;
	set temp;
	format age_P1 age_range_group.;
	run;

proc freq data=temp1;
	tables age_P1/missing;
	run;

kristiepauly_1-1709415790498.png

 


If you don't want the column heading to read Age_P1, you can assign a label to the variable Age_P1. In PROC FREQ add

 

label age_p1='Age at Current Visit';

 

or

 

label age_p1='Age Range Group';

 

Please note when creating labels, do your audience a favor and create labels that are actual English, with proper capitalization, and not a computer programming language variable name such as age_range_group.

 

 

--
Paige Miller
Tom
Super User Tom
Super User

I thought the format age_P1 age_range_group.; code is telling sas to make the variable age_P1 into the new variable age_range_group.  

Formats are instructions for how to convert values into text.  The FORMAT statement just attaches a format to a variable so it will be used to display the values of the variable.

 

If you want to use a format to make a new variable then use a normal ASSIGNMENT statement and use the PUT() function to apply the format to the variable.

 

So to make new variable named  age_range_group from the existing variable named  age_P1 using the format named age_range_group you would use a statement like this:

age_range_group = put(age_P1,age_range_group.);

AGE_RANGE_GROUP will be a CHARACTER variable since formats always generate text from values.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 731 views
  • 4 likes
  • 5 in conversation