BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
PrinceAde
Obsidian | Level 7
I'm trying to group derive age group as shown below.
<65
>=65
<75
>=75
data ydata;
set mdata;
if age < 65 then age_group = "<65";
else if age >= 65 then age_group = ">=65";
else if age < 75 then age_group = "<75";
else if age >= 75 then age_group = ">=75";
run;
When I ran this code it as I expected ignored the last two statements.
Is it possible to like that at all?
What can I do?

Thank you!



1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

@PrinceAde wrote:

I'm working on a sdtmig dataset, I have the Age variable;

I'm supposed to generate age_group;

<65
>=65
<75
>=75

and then generate the count and percent of the age_group using proc freq.


@PrinceAde Thank you but this does not explain what you want. If a person is 63 years old, should this person be in both the <65 and <75 group? 63 is <65 and 63 is also <75. That's what it sounds like you are asking for. Please explain and clear this up.

 

Whatever it is you want, use FORMATs.

--
Paige Miller

View solution in original post

18 REPLIES 18
PaigeMiller
Diamond | Level 26

It ignores the last two groupings because your logic is incorrect. After the first two tests, the person's age is either <65 or >=65, one of those two conditions has to be true and so the remaining tests are not performed. But we're still left guessing as to what you DO want out of this data — you haven't told us. So a guess: perhaps you want something like this:

 

if age < 65 then age_group = "<65";
else if age >= 65 and age<75 then age_group = "65-75";
else if age >= 75 then age_group = ">=75";

 

A better solution is to use custom formats and then use the formats in your analysis; rather than create a new variable which is a character string. Example:

 

proc format;
    value agef low-<65='<65' 65-<75='65-75' 75-high='>=75';
run;

proc means data=have;
    class age;
    format age agef.;
    var whatever;
run;

 

 

--
Paige Miller
AhmedAl_Attar
Ammonite | Level 13

Hi @PrinceAde 

SAS Formats are the perfect solution to this.

Check this paper for ilustartion The Power of the FORMAT Procedure

 

Hope this helps

PrinceAde
Obsidian | Level 7

Thank you very much sir, I will the proc format.

This is a brief on what I'm trying to achieve;

I'm working on a sdtmig dataset, I have the Age variable;

I'm supposed to generate age_group;

<65
>=65
<75
>=75

and then generate the count and percent using proc freq.

PrinceAde
Obsidian | Level 7

Ok sir. Thank you.

ballardw
Super User

You have another syntax issue involved related to the length of the result variable.

If you run:

data example;
 input age;
if age < 65 then age_group = "<65";
else if age >= 65 then age_group = ">=65";
else if age < 75 then age_group = "<75";
else if age >= 75 then age_group = ">=75";
datalines;
63
64
65
66
67
;


proc print data=example;
run;

You will get a result of

Obs age age_group 
1 63 <65 
2 64 <65 
3 65 >=6 
4 66 >=6 
5 67 >=6 

Why ">=6" you may ask? You did not define length for your Age_group variable. So the first use of the variable in the assignment:

if age < 65 then age_group = "<65";

established a length of three characters, enough to hold <65.

 

This is another thing that formats will avoid. The groups created by formats will be honored by reporting and analysis procedures and in most places for graphing. So you don't even have to add a variable at all.

 

PrinceAde
Obsidian | Level 7

Hi, I actually added a length statement before running the code.

"<65"

">=65"

"<75"

">=75" 

Is this possible to group like that at all?

 

 

PrinceAde
Obsidian | Level 7

I'm working on a sdtmig dataset, I have the Age variable;

I'm supposed to generate age_group;

<65
>=65
<75
>=75

and then generate the count and percent of the age_group using proc freq.

It is a case study.

 

PaigeMiller
Diamond | Level 26

@PrinceAde wrote:

I'm working on a sdtmig dataset, I have the Age variable;

I'm supposed to generate age_group;

<65
>=65
<75
>=75

and then generate the count and percent of the age_group using proc freq.


@PrinceAde Thank you but this does not explain what you want. If a person is 63 years old, should this person be in both the <65 and <75 group? 63 is <65 and 63 is also <75. That's what it sounds like you are asking for. Please explain and clear this up.

 

Whatever it is you want, use FORMATs.

--
Paige Miller
PrinceAde
Obsidian | Level 7

I did as you said using if else condition to create a new variable agegroup;

<65

>=65

as agroupx then proc freq

<75

>=75 

as agegroupy then proc freq.

Thank you very much. Very helpful.

PaigeMiller
Diamond | Level 26

@PrinceAde wrote:

I did as you said using if else condition to create a new variable agegroup;

<65

>=65

as agroupx then proc freq

<75

>=75 

as agegroupy then proc freq.

Thank you very much. Very helpful.


But I did not say to create new variables. I specifically said use formats. Part of the purpose of having these discussions is not just to provide an answer, but to point you in the direction of better methods. Formats are better for a lot of reasons than creating new variables, and apparently despite formats being mentioned by many people, you chose not to use this better method.

--
Paige Miller
PrinceAde
Obsidian | Level 7

Yes, I initially used proc format, however I  realized that I still needed to generate proc means for the initial age variable, hence the reason I created the agegroup variable to be able to generate the proc frequencies  separately.

 

PaigeMiller
Diamond | Level 26

@PrinceAde formats work in PROC MEANS and most data analysis PROCs

--
Paige Miller
ballardw
Super User

@PrinceAde wrote:

I'm working on a sdtmig dataset, I have the Age variable;

I'm supposed to generate age_group;

<65
>=65
<75
>=75

and then generate the count and percent of the age_group using proc freq.

It is a case study.

 


If, as in @PaigeMiller's example a person that is of age 63 is supposed to be counted in both the <65 and the <75 age groups you have a very limited choice. Proc freq will not do this with a single variable. In fact the only approach that I think is reasonable would not use such a value at all as the tools reside in special places of format definition and 4 procedures that can use them.

 

One of the reasons we so often ask for example data and the result of your process for that given data is that it does not require pulling one fact at a time.

Here is a small enough example data set that you should be able to count categories you want. SHOW us the result using this data set that you would want from Proc Freq. Don't write any code to assign values. Manually count and show us the result of how many are in each category. Or just list which category belongs next to each value.

data example;
   input age;
datalines;
55
57
63
64
64
64
65
66
67
67
68
68
69
70
71
71
71
72
72
73
74
75 
76
76
77
77
78
;

 

sas-innovate-white.png

🚨 Early Bird Rate Extended!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Lock in the best rate now before the price increases on April 1.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 18 replies
  • 4830 views
  • 5 likes
  • 5 in conversation