BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
FefeChen
Calcite | Level 5

Hello, 

 

I used the "class" keyword to declare it as a categorical variable, but the results of the SAS calculation were quite strange, especially the estimated p-value. Like this.

Code:

proc logistic data=database;
class LT_RL_II(ref='0');
model Event_Variable(event='1')=LT_RL_II Gender Age PB_CII Dept_II Sleep/risklimits;
run;

But I got the tricky results. I found that the estimated value of the variable "LT_RL_II 3 v.s. 0" is not significant under the 95% confidence interval estimate (OR: 3.085 95%CI: 0.956-9.952), but in the above p-value estimate (p=0.0384). It shows significant (p<0.05).

 

However, I used another approach, which divided categrocical data into dunny variable. The problem had been solved.

 

Transfor Code

if LT_RL_II= 0 then do ; g1=0; g2=0; g3=0; end;

if LT_RL_II= 1 then do ; g1=1; g2=0; g3=0; end;

if LT_RL_II= 2 then do ; g1=0; g2=1; g3=0; end;

if LT_RL_II= 3 then do ; g1=0; g2=0; g3=1; end; run; 

After using dunny variable, the OR estimate is consistent with the method using the "Class" keyword, but the pvalue changes(pvalue:0.0594).

 

FefeChen_0-1706703452479.png

 

In SAS regression syntax, when using the CLASS statement to declare categorical variables and automatically generate corresponding dummy variables, why the results in the maximum likelihood estimation table are inconsistent with manually generated ones (Figure 2, g1 (LT_TL_II level 1 vs. 0), g2 (LT_TL_II level 2 vs. 0), g3 (LT_TL_II level 3 vs. 0))?

 

Interesting is that, apart from the maximum likelihood estimation table, the results of the two approaches are consistent in other aspects (OR estimation). Upon closer examination, the odds ratio (OR) estimates in Figure 1 should theoretically be non-significant, but they appear as significant variables in the maximum likelihood estimation. Conversely, the results in Figure 2 provide reasonable p-values (non-significant).

 

This confuses me. Does SAS use two methods to estimate pvalue when executing this job?

 

My Hardware

cpu:i5-10350U

RAM:4G

SAS version: 9.4 at D5

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

The answers to many questions like this are addressed in the SAS Notes which you can search at https://support.sas.com/en/knowledge-base.html . For this question, you would find this note that addresses your question.

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

I'm am unable to read the Chinese characters, but I think this is what is happening.

 

The first tests whether the effect of LT_RL_II level 3 is equal to the value zero. This is not the same test as the odds ratio test below, which is comparing level 3 to level 0 (the reference level), and so this confidence interval includes 1, indicating no difference between level 3 and level 0.

--
Paige Miller
FefeChen
Calcite | Level 5

Thank you for your response. I'm curious about the discrepancy in p-value estimation between using the CLASS statement and manually creating dummy variables. In theory, when declaring a class variable, SAS should automatically generate corresponding dummy variables (similar to manually setting up g1/g2/g3, like figure 2), and the results generated should be consistent with manual conversion (whether it's OR estimation or p-value). However, there seems to be a discrepancy in the test results.

FreelanceReinh
Jade | Level 19

Hello @FefeChen,

 

I suspect* that you created the dummy variables like this:

g1=(LT_RL_II=1);
g2=(LT_RL_II=2);
g3=(LT_RL_II=3);

This would correspond to reference cell coding (see documentation). However, your PROC LOGISTIC code uses the default parameterization, effect coding. So, to replicate this you should define

g1=(LT_RL_II=1)-(LT_RL_II=0);
g2=(LT_RL_II=2)-(LT_RL_II=0);
g3=(LT_RL_II=3)-(LT_RL_II=0);

With this definition I would expect that all discrepancies (including that about p-values vs. odds ratio confidence intervals) will disappear.

 

* EDIT: I hadn't seen the update to your post, which confirms that you created the dummy variables that way.

PaigeMiller
Diamond | Level 26

I believe @FreelanceReinh is correct. But just to add one more point to his statement, when you have class variables, the solution (the regression coefficients) are not unique. There are many many (actually infinite) number of solutions for the regression coefficients, that are all equivalent and produce the identical model. I wrote an article about this (it wasn't about logistic regression, but that shouldn't matter, SAS handles class variables in model the same everywhere).

--
Paige Miller
StatDave
SAS Super FREQ

The answers to many questions like this are addressed in the SAS Notes which you can search at https://support.sas.com/en/knowledge-base.html . For this question, you would find this note that addresses your question.

FefeChen
Calcite | Level 5

Thank you everyone for your responses, I have found a solution to the problem.

Because I didn't specify the position of the param in the class declaration, SAS didn't know what to use as the reference group for maximum likelihood estimation. The corrected procedure is as follows.

proc logistic data=database;
class LT_RL_II(param=ref ref='0');
model Event_Variable(event='1')=LT_RL_II Gender Age PB_CII Dept_II Sleep/risklimits;
run;

After rerunning the procedure, the results are consistent with those obtained by manually setting up dummy variables in Figure 2.

 

Thank you everyone for your guidance and suggestions.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1504 views
  • 1 like
  • 4 in conversation