I have an outcome with four levels:
Level 1: Starting in group A and staying in group A
Level 2: Starting in group A and switching to group B
Level 3: Starting in group B and staying in group B
Level 4: Starting in group B and switching to group A
This outcome requires multinomial logistic regression, since there are more than two levels.
My question:
Can this outcome be re-coded as a binary variable in order to conduct binary logistic regression?
For example comparing "Level 2" vs. "Not Level 2 (levels 1, 3 and 4 combined)"?
I understand how to reformat this variable in SAS so that it's technically a binary variable, but I'm concerned if this would be a valid statistical approach for handling a variable that conceptually has 4 distinct levels.
Are these "levels" stored in a single variable? If so then the answer is likely "yes" but depending on your exact use may have limitations.
Create a custom format. The example below assumes the values are numeric but not much change is needed for character values.
proc format library=work; value level_1_others 1 = 'Level 1' 2,3,4 = 'Not Level 1' ; run; data have; input level; datalines; 1 2 3 4 1 1 3 ; proc print data=have; format level level_1_others.; run; proc freq data=have; format level level_1_others.; run;
Formats are one of the powerful tools in SAS because a change of format is honored for reporting or most analysis or graphing. Multiple formats for the same variable create different analysis. For some of my reports that use a persons age I have formats to group the data by 3-, 5- and 10-year age bands plus specific ranges such as pre-teen vs teen vs adult, or at ages for different program qualifications.
The weakness with attempting to use formats is the requirement to have a single variable.
What you do run into with some operations is having to specify the formatted value for some purposes like reference levels.
This sort of category collapsing is frequently done. Though I would be hesitant to create a category that combined "Dead" with "in good health".
Format names, the first thing following Value cannot end in a digit because the digits are used to control the number of characters displayed, if the values are character the name must start with a $ and be when assigning the format.
In general yes.
It is possible to create nonsensical combinations (or hard to interpret/explain) given specific data.
To answer the question of the validity of collapsing categories, it is critical that we know what the research question is, and how you expect the data you have to be used in answering the research question.
The reason I pose it that way is that the only answer I can offer given the information at hand is "I don't know enough about what you are doing to come up with an appropriate answer." For some research questions, it would make sense to collapse the AB testing to a binomial; for other questions, you may need to stay with the multinomial.
SteveDenham
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.