Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- What Reference Category in Logistic regression

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 11-26-2020 01:18 PM
(9017 views)

Hi,

I am new to SAS and implementing logistic regression. I would like to know what is reference category in logistic regression. How is it useful. I have a categorical variable called "Level of pain" as no pain, less pain, medium, high and extreme. I have created dummy variables out of the categories. Which of the dummy variable need to be given as reference category? And what options I need to give in proc logistic regression to choose a best reference category?

Many thanks for the help!!

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

13 REPLIES 13

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

AFAIK, there is no such thing as a 'best reference category' and you don't need to create dummy variables for logistic regression in SAS, it does it automatically.

Have you worked through the examples in the PROC LOGISTIC documentation? It includes full code and I believe the second example is about categorical variables. The documentation uses the GLM method of parameterization for categorical variables but the usual desired option is the REF method.

Documentation examples

An example on the REF option is here:

https://stats.idre.ucla.edu/sas/dae/logit-regression/

The different types of paramterization methods are outlined here, but not all are available in every procedure:

That should be enough to get you started, feel free to post any further questions.

@chapidi99 wrote:

Hi,

I am new to SAS and implementing logistic regression. I would like to know what is reference category in logistic regression. How is it useful. I have a categorical variable called "Level of pain" as no pain, less pain, medium, high and extreme. I have created dummy variables out of the categories. Which of the dummy variable need to be given as reference category? And what options I need to give in proc logistic regression to choose a best reference category?

Many thanks for the help!!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I assume that your Level of Pain variable is a predictor in the model rather than the response variable. In that case, you do not need to create dummy variables because that is what the CLASS statement does for you. It also allows you to pick the reference category with the REF= option. For example, if your original variable is called LevelOfPain with values 1, 2, 3, 4, or 5, and you want to use level 1 as the reference level, then specify

`class LevelOfPain(ref="1") / param=glm;`

Then include LevelOfPain in your MODEL statement. There is no "best" reference category. The choice is arbitrary and is made for convenience of interpretation. The above CLASS statement will create the conventional 0,1-coded dummy variables with level 1 as the reference level (all dummies equal 0). The parameter estimates will be interpreted as the difference in effect of each level compared to the reference level, 1.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Chapi wrote:

Not clear what you did when the Pr>ChiSq values changed, could you show us the code and output before and after, plus the corresponding outputs?

Could you also please clarify if this reference category you want is for an independent variable or for the dependent variable?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello,

proc logistic data=Work.Dataset desc plots(only)=roc ;

class Age Breath Blood Water Heart Stomach Heavey other UBEL water2 Eyesight Dialysis hearing hearingdevice glasses water3 Psychiatri pregnancy

/param=glm;

model GCPS_Binry = Age Breath Blood Water Heart Stomach Heavey other UBEL water2 Eyesight Dialysis hearing hearingdevice glasses water3 Psychiatri pregnancy

/ selection=stepwise ;

output out=out3 p=pred1;

run;

Previous results when not used class statement

Analysis of Maximum Likelihood Estimates | |||||

Parameter | DF | Estimate | Standard | Wald | Pr > ChiSq |

Error | Chi-Square | ||||

Intercept | 1 | -0.9657 | 0.1175 | 67.5333 | <.0001 |

Age | 1 | 0.3497 | 0.1139 | 9.4265 | 0.0021 |

Breath | 1 | 0.2859 | 0.1198 | 5.6935 | 0.017 |

Blood | 1 | 0.2656 | 0.1151 | 5.326 | 0.021 |

Water | 1 | 0.2992 | 0.1099 | 7.4155 | 0.0065 |

Heart | 1 | 0.2311 | 0.1054 | 4.8034 | 0.0284 |

Stomach | 1 | 0.2595 | 0.1132 | 5.2547 | 0.0219 |

Water3 | 1 | 0.594 | 0.1425 | 17.3817 | <.0001 |

Glasses | 1 | 0.2656 | 0.1062 | 6.2524 | 0.0124 |

UBEL | 1 | 0.4403 | 0.1086 | 16.4331 | <.0001 |

Eyesight | 1 | 0.3095 | 0.115 | 7.2445 | 0.0071 |

Latest results when used class statement for reference categories: As you can see the the Pe>ChiSq is greater than 0.005 for some of the categories.

Analysis of Maximum Likelihood Estimates | ||||||

Parameter | DF | Estimate | Standard | Wald | Pr > ChiSq | |

Error | Chi-Square | |||||

Intercept | 1 | 5.9508 | 1.6798 | 12.5494 | 0.0004 | |

Breath | -0.45075276 | 1 | 9.599 | 180.8 | 0.0028 | 0.9577 |

Breath | 1.91797094 | 1 | 10.5818 | 180.8 | 0.0034 | 0.9533 |

Age | -0.59358016 | 1 | -11.4667 | 180.8 | 0.004 | 0.9494 |

Age | 1.39111034 | 1 | -10.8176 | 180.8 | 0.0036 | 0.9523 |

Age | -0.76335864 | 1 | -0.348 | 1.2791 | 0.074 | 0.7856 |

Age | 1.01428795 | 1 | 0.2879 | 1.2865 | 0.0501 | 0.8229 |

Age | -0.70972869 | 1 | -1.1224 | 0.6951 | 2.6078 | 0.1063 |

Age | 1.20129764 | 1 | -0.4857 | 0.7085 | 0.4699 | 0.493 |

Breath | -0.45413734 | 1 | -2.4323 | 1.018 | 5.7091 | 0.0169 |

Breath | 1.29755151 | 1 | -2.025 | 1.0334 | 3.8399 | 0.05 |

Water | -2.46127958 | 1 | -2.1778 | 0.5524 | 15.5422 | <.0001 |

Water | -0.87911213 | 1 | -0.4818 | 0.3552 | 1.8396 | 0.175 |

Water | -0.67641088 | 1 | -0.9268 | 0.6435 | 2.0743 | 0.1498 |

Water | 0.39799746 | 1 | 0.00767 | 0.3008 | 0.0007 | 0.9797 |

Glasses | -0.69546905 | 1 | -1.7676 | 0.8233 | 4.6099 | 0.0318 |

Glasses | 0.86088881 | 1 | -1.5396 | 1.0549 | 2.1302 | 0.1444 |

Eyesight | 1.27360571 | 1 | -1.006 | 0.8363 | 1.4469 | 0.229 |

Eyesight | -1.04615533 | 1 | -0.6488 | 0.237 | 7.4918 | 0.0062 |

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello, Please see the above results before and after using class statement in proc logistic regression and code.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I'm really having a lot of trouble understanding the problem, you start by talking about "level of pain" as a variable, but I don't see it in your code. And its still not clear to me if the "level of pain" variable is the dependent variable or an independent variable. Could you please clarify this?

As far as your p-values, only the categorical variables go in the CLASS statement. The continuous variables do not go in the CLASS statement.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Chapi wrote:

So is your original question about reference category referring to the independent variables or the dependent variable (or both)?

Your p-values are not comparable across the two different models. Once you switch to categorizing Age (and other variables ) by WOE, you can't expect the same answers as when age was used as a continuous variable, they may not even be close.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

My question was about referencing independent variables.

Both the outputs are generated after implementing WOE transformation and age variable as categorical in both models. Only the difference is applying reference category in the latest output and previously without reference category.

I have a question about the p-value, Should we look at the whole variable as significant rather that the categories of the variables?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I have a question about the p-value, Should we look at the whole variable as significant rather that the categories of the variables?

PROC LOGISTIC produces coefficients for each level of the CLASS variable (where one level should have a zero coefficient), these are tested to see if the coefficient is zero, and a p-value is reported. PROC LOGISTIC also produces a Type III test which tests to see if the coefficients are equal across all levels of the CLASS variable. This is a different test than the one you show, and has different meaning and different p-values.

So, you might want to look at both the Type III test and the test of the individual coefficients, and interpret both together simultaneously.

--

Paige Miller

Paige Miller

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.