SAS Programming

PrinceAde · Posted 04-09-2023 06:19 AM

I'm trying to group derive age group as shown below.
<65
>=65
<75
>=75
data ydata;
set mdata;
if age < 65 then age_group = "<65";
else if age >= 65 then age_group = ">=65";
else if age < 75 then age_group = "<75";
else if age >= 75 then age_group = ">=75";
run;
When I ran this code it as I expected ignored the last two statements.
Is it possible to like that at all?
What can I do?

Thank you!

PaigeMiller · Posted 04-09-2023 02:48 PM

@PrinceAde wrote:

I'm working on a sdtmig dataset, I have the Age variable;

I'm supposed to generate age_group;

<65
>=65
<75
>=75

and then generate the count and percent of the age_group using proc freq.

@PrinceAde Thank you but this does not explain what you want. If a person is 63 years old, should this person be in both the <65 and <75 group? 63 is <65 and 63 is also <75. That's what it sounds like you are asking for. Please explain and clear this up.

Whatever it is you want, use FORMATs.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 04-09-2023 06:25 AM

It ignores the last two groupings because your logic is incorrect. After the first two tests, the person's age is either <65 or >=65, one of those two conditions has to be true and so the remaining tests are not performed. But we're still left guessing as to what you DO want out of this data — you haven't told us. So a guess: perhaps you want something like this:

if age < 65 then age_group = "<65";
else if age >= 65 and age<75 then age_group = "65-75";
else if age >= 75 then age_group = ">=75";

A better solution is to use custom formats and then use the formats in your analysis; rather than create a new variable which is a character string. Example:

proc format;
    value agef low-<65='<65' 65-<75='65-75' 75-high='>=75';
run;

proc means data=have;
    class age;
    format age agef.;
    var whatever;
run;

--
Paige Miller

AhmedAl_Attar · Posted 04-09-2023 06:28 AM

Hi @PrinceAde

SAS Formats are the perfect solution to this.

Check this paper for ilustartion The Power of the FORMAT Procedure

Hope this helps

PrinceAde · Posted 04-09-2023 02:18 PM

Thank you very much sir, I will the proc format.

This is a brief on what I'm trying to achieve;

I'm working on a sdtmig dataset, I have the Age variable;

I'm supposed to generate age_group;

<65
>=65
<75
>=75

and then generate the count and percent using proc freq.

Kurt_Bremser · Posted 04-09-2023 06:32 AM

I can only reinforce what has already been said: use a format.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PrinceAde · Posted 04-09-2023 02:21 PM

Ok sir. Thank you.

ballardw · Posted 04-09-2023 01:07 PM

You have another syntax issue involved related to the length of the result variable.

If you run:

data example;
 input age;
if age < 65 then age_group = "<65";
else if age >= 65 then age_group = ">=65";
else if age < 75 then age_group = "<75";
else if age >= 75 then age_group = ">=75";
datalines;
63
64
65
66
67
;


proc print data=example;
run;

You will get a result of

Obs age age_group 
1 63 <65 
2 64 <65 
3 65 >=6 
4 66 >=6 
5 67 >=6

Why ">=6" you may ask? You did not define length for your Age_group variable. So the first use of the variable in the assignment:

if age < 65 then age_group = "<65";

established a length of three characters, enough to hold <65.

This is another thing that formats will avoid. The groups created by formats will be honored by reporting and analysis procedures and in most places for graphing. So you don't even have to add a variable at all.

PrinceAde · Posted 04-09-2023 02:42 PM

Hi, I actually added a length statement before running the code.

"<65"

">=65"

"<75"

">=75"

Is this possible to group like that at all?

PrinceAde · Posted 04-09-2023 02:16 PM

I'm working on a sdtmig dataset, I have the Age variable;

I'm supposed to generate age_group;

<65
>=65
<75
>=75

and then generate the count and percent of the age_group using proc freq.

It is a case study.

PaigeMiller · Posted 04-09-2023 02:48 PM

@PrinceAde wrote:

I'm working on a sdtmig dataset, I have the Age variable;

I'm supposed to generate age_group;

<65
>=65
<75
>=75

and then generate the count and percent of the age_group using proc freq.

@PrinceAde Thank you but this does not explain what you want. If a person is 63 years old, should this person be in both the <65 and <75 group? 63 is <65 and 63 is also <75. That's what it sounds like you are asking for. Please explain and clear this up.

Whatever it is you want, use FORMATs.

--
Paige Miller

PrinceAde · Posted 04-11-2023 09:00 AM

I did as you said using if else condition to create a new variable agegroup;

<65

>=65

as agroupx then proc freq

<75

>=75

as agegroupy then proc freq.

Thank you very much. Very helpful.

PaigeMiller · Posted 04-11-2023 09:04 AM

@PrinceAde wrote:

I did as you said using if else condition to create a new variable agegroup;

<65

>=65

as agroupx then proc freq

<75

>=75

as agegroupy then proc freq.

Thank you very much. Very helpful.

But I did not say to create new variables. I specifically said use formats. Part of the purpose of having these discussions is not just to provide an answer, but to point you in the direction of better methods. Formats are better for a lot of reasons than creating new variables, and apparently despite formats being mentioned by many people, you chose not to use this better method.

--
Paige Miller

PrinceAde · Posted 04-11-2023 09:40 AM

Yes, I initially used proc format, however I realized that I still needed to generate proc means for the initial age variable, hence the reason I created the agegroup variable to be able to generate the proc frequencies separately.

PaigeMiller · Posted 04-11-2023 09:58 AM

@PrinceAde formats work in PROC MEANS and most data analysis PROCs

--
Paige Miller

ballardw · Posted 04-09-2023 07:00 PM

@PrinceAde wrote:

I'm working on a sdtmig dataset, I have the Age variable;

I'm supposed to generate age_group;

<65
>=65
<75
>=75

and then generate the count and percent of the age_group using proc freq.

It is a case study.

If, as in @PaigeMiller's example a person that is of age 63 is supposed to be counted in both the <65 and the <75 age groups you have a very limited choice. Proc freq will not do this with a single variable. In fact the only approach that I think is reasonable would not use such a value at all as the tools reside in special places of format definition and 4 procedures that can use them.

One of the reasons we so often ask for example data and the result of your process for that given data is that it does not require pulling one fact at a time.

Here is a small enough example data set that you should be able to count categories you want. SHOW us the result using this data set that you would want from Proc Freq. Don't write any code to assign values. Manually count and show us the result of how many are in each category. Or just list which category belongs next to each value.

data example;
   input age;
datalines;
55
57
63
64
64
64
65
66
67
67
68
68
69
70
71
71
71
72
72
73
74
75 
76
76
77
77
78
;

SAS Programming

Age grouping

Re: Age grouping

Re: Age grouping

Re: Age grouping

Re: Age grouping

Re: Age grouping

Re: Age grouping

Re: Age grouping

Re: Age grouping

Re: Age grouping

Re: Age grouping

Re: Age grouping

Re: Age grouping

Re: Age grouping

Re: Age grouping

Re: Age grouping

Round age

Age adjusted Median estimation

Creating Age Groups from Age Variable

Trying to group age's

How to create a nice graph with age group and gender

Follow Us

What is...

SAS Programming

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...