Programming the statistical procedures from SAS

creating a categorical variable with overlapping categories

Posts: 0

creating a categorical variable with overlapping categories


I am trying to create a categorical AGE variable from a continuous variable. I'm conducting an accelerated longitudinal design, so the categories of the age variable need to overlap. Here is the code that I have written:

if age1 ge 31 and age1 le 49 then cohort = 1;
else if age1 ge 45 and age1 le 55 then cohort = 2;
else if age1 ge 47 and age1 le 59 then cohort = 3;
else if age1 ge 55 and age1 le 65 then cohort = 4;
else if age1 ge 60 and age1 le 70 then cohort = 5;
else if age1 ge 65 and age1 le 75 then cohort = 6;
else if age1 ge 70 then cohort = 7;

However, this is not working. For example, SAS puts all the 45, 46, 47, 48, and 49-year-olds into cohort 1, but not both cohort 1 and 2. Therefore, my categories are not overlapping.

Any insight would be helpful on how to do this! I have tried several different ways, and all have the same result.
Regular Contributor
Posts: 165

Re: creating a categorical variable with overlapping categories

If you use "else" then the following if statement wont be evaluated unless the previous one was false.

Even if you remove the "else" you would get the wrong results. Try making 7 indicator variables instead of one categorical variable.
Posts: 8,687

Re: creating a categorical variable with overlapping categories


Certain SAS procedures (like PROC TABULATE or PROC MEANS for example) allow the use of "multi-label" formats for the purpose of counting a single observation in more than one category.

If you look in the documentation under these topics:
The TABULATE Procedure: Using Multilabel Formats
The MEANS Procedure: Using Multilabel Value Formats with Class Variables

You will find an example for each procedure that shows how to use Multi-label formats specifically in these procedures.

When you use an ELSE IF type of condition, think of the observation as being a single pinball in a pinball machine. It can only bounce into one category. As soon as the observation meets the first condition (in your case, the first cohort) then the observation "exits" the entire IF/ELSE syntax other tests are done for that observation. But, if you change to a series of single IF statements (without an ELSE), while the same observation would be tested multiple times, you only have a single COHORT now the COHORT value would be the LAST condition that was met -- which would still not be what you want.

If you need an external variable which will uniquely identify all the cohorts that a person "qualifies" for based on their age, then you might consider something like this, where you create a "string" variable whose value is a series of 0 or 1 based on which cohort or cohorts a person's age falls into:

1234567 <---- position of cohort group in string
0000000 <----- obs not in any cohort group
1000000 = ONLY cohort 1
1100000 = ONLY cohort 1 and 2
0110000 = only cohort 2 and 3
0011000 = only cohort 3 and 4, etc
etc, etc

Or, you could build a string of Y and N. There are probably many different ways to set categories, using Multi-Label Formats or a string of 0 and 1 are what I see a lot.

Ask a Question
Discussion stats
  • 2 replies
  • 3 in conversation