BookmarkSubscribeRSS Feed
greesamu
Obsidian | Level 7

I have an outcome with four levels:

Level 1: Starting in group A and staying in group A

Level 2: Starting in group A and switching to group B

Level 3: Starting in group B and staying in group B

Level 4: Starting in group B and switching to group A

This outcome requires multinomial logistic regression, since there are more than two levels. 

 

My question:

Can this outcome be re-coded as a binary variable in order to conduct binary logistic regression?

For example comparing "Level 2" vs. "Not Level 2 (levels 1, 3 and 4 combined)"?

 

I understand how to reformat this variable in SAS so that it's technically a binary variable, but I'm concerned if this would be a valid statistical approach for handling a variable that conceptually has 4 distinct levels.

 

4 REPLIES 4
ballardw
Super User

Are these "levels" stored in a single variable? If so then the answer is likely "yes" but depending on your exact use may have limitations.

 

Create a custom format. The example below assumes the values are numeric but not much change is needed for character values.

proc format library=work;
value level_1_others
1  = 'Level 1'
2,3,4 = 'Not Level 1'
;
run;


data have;
   input level;
datalines;
1
2
3
4
1
1
3
;

proc print data=have;
   format level level_1_others.;
run;

proc freq data=have;
   format level level_1_others.;
run;
   

Formats are one of the powerful tools in SAS because a change of format is honored for reporting or most analysis or graphing. Multiple formats for the same variable create different analysis. For some of my reports that use a persons age I have formats to group the data by 3-, 5- and 10-year age bands plus specific ranges such as pre-teen vs teen vs adult, or at ages for different program qualifications.

The weakness with attempting to use formats is the requirement to have a single variable.

 

What you do run into with some operations is having to specify the formatted value for some purposes like reference levels.

 

This sort of category collapsing is frequently done. Though I would be hesitant to create a category that combined "Dead" with "in good health".

 

Format names, the first thing following Value cannot end in a digit because the digits are used to control the number of characters displayed, if the values are character the name must start with a $ and be when assigning the format.

 

greesamu
Obsidian | Level 7
Thank you for this response. I've updated my question to clarify that my concern is from a statistical perspective: would this approach result in a valid inference?
ballardw
Super User

In general yes.

 

It is possible to create nonsensical combinations (or hard to interpret/explain) given specific data.

SteveDenham
Jade | Level 19

To answer the question of the validity of collapsing categories, it is critical that we know what the research question is, and how you expect the data you have to be used in answering the research question.

 

The reason I pose it that way is that the only answer I can offer given the information at hand is "I don't know enough about what you are doing to come up with an appropriate answer." For some research questions, it would make sense to collapse the AB testing to a binomial; for other questions, you may need to stay with the multinomial.

 

SteveDenham

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 291 views
  • 0 likes
  • 3 in conversation