- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
<65
>=65
<75
>=75
data ydata;
set mdata;
if age < 65 then age_group = "<65";
else if age >= 65 then age_group = ">=65";
else if age < 75 then age_group = "<75";
else if age >= 75 then age_group = ">=75";
run;
When I ran this code it as I expected ignored the last two statements.
Is it possible to like that at all?
What can I do?
Thank you!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PrinceAde wrote:
I'm working on a sdtmig dataset, I have the Age variable;
I'm supposed to generate age_group;
<65
>=65
<75
>=75and then generate the count and percent of the age_group using proc freq.
@PrinceAde Thank you but this does not explain what you want. If a person is 63 years old, should this person be in both the <65 and <75 group? 63 is <65 and 63 is also <75. That's what it sounds like you are asking for. Please explain and clear this up.
Whatever it is you want, use FORMATs.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It ignores the last two groupings because your logic is incorrect. After the first two tests, the person's age is either <65 or >=65, one of those two conditions has to be true and so the remaining tests are not performed. But we're still left guessing as to what you DO want out of this data — you haven't told us. So a guess: perhaps you want something like this:
if age < 65 then age_group = "<65";
else if age >= 65 and age<75 then age_group = "65-75";
else if age >= 75 then age_group = ">=75";
A better solution is to use custom formats and then use the formats in your analysis; rather than create a new variable which is a character string. Example:
proc format;
value agef low-<65='<65' 65-<75='65-75' 75-high='>=75';
run;
proc means data=have;
class age;
format age agef.;
var whatever;
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @PrinceAde
SAS Formats are the perfect solution to this.
Check this paper for ilustartion The Power of the FORMAT Procedure
Hope this helps
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much sir, I will the proc format.
This is a brief on what I'm trying to achieve;
I'm working on a sdtmig dataset, I have the Age variable;
I'm supposed to generate age_group;
<65
>=65
<75
>=75
and then generate the count and percent using proc freq.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I can only reinforce what has already been said: use a format.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ok sir. Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You have another syntax issue involved related to the length of the result variable.
If you run:
data example; input age; if age < 65 then age_group = "<65"; else if age >= 65 then age_group = ">=65"; else if age < 75 then age_group = "<75"; else if age >= 75 then age_group = ">=75"; datalines; 63 64 65 66 67 ; proc print data=example; run;
You will get a result of
Obs age age_group 1 63 <65 2 64 <65 3 65 >=6 4 66 >=6 5 67 >=6
Why ">=6" you may ask? You did not define length for your Age_group variable. So the first use of the variable in the assignment:
if age < 65 then age_group = "<65";
established a length of three characters, enough to hold <65.
This is another thing that formats will avoid. The groups created by formats will be honored by reporting and analysis procedures and in most places for graphing. So you don't even have to add a variable at all.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I actually added a length statement before running the code.
"<65"
">=65"
"<75"
">=75"
Is this possible to group like that at all?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm working on a sdtmig dataset, I have the Age variable;
I'm supposed to generate age_group;
<65
>=65
<75
>=75
and then generate the count and percent of the age_group using proc freq.
It is a case study.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PrinceAde wrote:
I'm working on a sdtmig dataset, I have the Age variable;
I'm supposed to generate age_group;
<65
>=65
<75
>=75and then generate the count and percent of the age_group using proc freq.
@PrinceAde Thank you but this does not explain what you want. If a person is 63 years old, should this person be in both the <65 and <75 group? 63 is <65 and 63 is also <75. That's what it sounds like you are asking for. Please explain and clear this up.
Whatever it is you want, use FORMATs.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I did as you said using if else condition to create a new variable agegroup;
<65
>=65
as agroupx then proc freq
<75
>=75
as agegroupy then proc freq.
Thank you very much. Very helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PrinceAde wrote:
I did as you said using if else condition to create a new variable agegroup;
<65
>=65
as agroupx then proc freq
<75
>=75
as agegroupy then proc freq.
Thank you very much. Very helpful.
But I did not say to create new variables. I specifically said use formats. Part of the purpose of having these discussions is not just to provide an answer, but to point you in the direction of better methods. Formats are better for a lot of reasons than creating new variables, and apparently despite formats being mentioned by many people, you chose not to use this better method.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I initially used proc format, however I realized that I still needed to generate proc means for the initial age variable, hence the reason I created the agegroup variable to be able to generate the proc frequencies separately.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PrinceAde formats work in PROC MEANS and most data analysis PROCs
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PrinceAde wrote:
I'm working on a sdtmig dataset, I have the Age variable;
I'm supposed to generate age_group;
<65
>=65
<75
>=75and then generate the count and percent of the age_group using proc freq.
It is a case study.
If, as in @PaigeMiller's example a person that is of age 63 is supposed to be counted in both the <65 and the <75 age groups you have a very limited choice. Proc freq will not do this with a single variable. In fact the only approach that I think is reasonable would not use such a value at all as the tools reside in special places of format definition and 4 procedures that can use them.
One of the reasons we so often ask for example data and the result of your process for that given data is that it does not require pulling one fact at a time.
Here is a small enough example data set that you should be able to count categories you want. SHOW us the result using this data set that you would want from Proc Freq. Don't write any code to assign values. Manually count and show us the result of how many are in each category. Or just list which category belongs next to each value.
data example; input age; datalines; 55 57 63 64 64 64 65 66 67 67 68 68 69 70 71 71 71 72 72 73 74 75 76 76 77 77 78 ;