Hello,
I used the "class" keyword to declare it as a categorical variable, but the results of the SAS calculation were quite strange, especially the estimated p-value. Like this.
Code:
proc logistic data=database;
class LT_RL_II(ref='0');
model Event_Variable(event='1')=LT_RL_II Gender Age PB_CII Dept_II Sleep/risklimits;
run;
But I got the tricky results. I found that the estimated value of the variable "LT_RL_II 3 v.s. 0" is not significant under the 95% confidence interval estimate (OR: 3.085 95%CI: 0.956-9.952), but in the above p-value estimate (p=0.0384). It shows significant (p<0.05).
However, I used another approach, which divided categrocical data into dunny variable. The problem had been solved.
Transfor Code
if LT_RL_II= 0 then do ; g1=0; g2=0; g3=0; end;
if LT_RL_II= 1 then do ; g1=1; g2=0; g3=0; end;
if LT_RL_II= 2 then do ; g1=0; g2=1; g3=0; end;
if LT_RL_II= 3 then do ; g1=0; g2=0; g3=1; end; run;
After using dunny variable, the OR estimate is consistent with the method using the "Class" keyword, but the pvalue changes(pvalue:0.0594).
In SAS regression syntax, when using the CLASS statement to declare categorical variables and automatically generate corresponding dummy variables, why the results in the maximum likelihood estimation table are inconsistent with manually generated ones (Figure 2, g1 (LT_TL_II level 1 vs. 0), g2 (LT_TL_II level 2 vs. 0), g3 (LT_TL_II level 3 vs. 0))?
Interesting is that, apart from the maximum likelihood estimation table, the results of the two approaches are consistent in other aspects (OR estimation). Upon closer examination, the odds ratio (OR) estimates in Figure 1 should theoretically be non-significant, but they appear as significant variables in the maximum likelihood estimation. Conversely, the results in Figure 2 provide reasonable p-values (non-significant).
This confuses me. Does SAS use two methods to estimate pvalue when executing this job?
My Hardware
cpu:i5-10350U
RAM:4G
SAS version: 9.4 at D5
The answers to many questions like this are addressed in the SAS Notes which you can search at https://support.sas.com/en/knowledge-base.html . For this question, you would find this note that addresses your question.
I'm am unable to read the Chinese characters, but I think this is what is happening.
The first tests whether the effect of LT_RL_II level 3 is equal to the value zero. This is not the same test as the odds ratio test below, which is comparing level 3 to level 0 (the reference level), and so this confidence interval includes 1, indicating no difference between level 3 and level 0.
Thank you for your response. I'm curious about the discrepancy in p-value estimation between using the CLASS statement and manually creating dummy variables. In theory, when declaring a class variable, SAS should automatically generate corresponding dummy variables (similar to manually setting up g1/g2/g3, like figure 2), and the results generated should be consistent with manual conversion (whether it's OR estimation or p-value). However, there seems to be a discrepancy in the test results.
Hello @FefeChen,
I suspect* that you created the dummy variables like this:
g1=(LT_RL_II=1);
g2=(LT_RL_II=2);
g3=(LT_RL_II=3);
This would correspond to reference cell coding (see documentation). However, your PROC LOGISTIC code uses the default parameterization, effect coding. So, to replicate this you should define
g1=(LT_RL_II=1)-(LT_RL_II=0);
g2=(LT_RL_II=2)-(LT_RL_II=0);
g3=(LT_RL_II=3)-(LT_RL_II=0);
With this definition I would expect that all discrepancies (including that about p-values vs. odds ratio confidence intervals) will disappear.
* EDIT: I hadn't seen the update to your post, which confirms that you created the dummy variables that way.
I believe @FreelanceReinh is correct. But just to add one more point to his statement, when you have class variables, the solution (the regression coefficients) are not unique. There are many many (actually infinite) number of solutions for the regression coefficients, that are all equivalent and produce the identical model. I wrote an article about this (it wasn't about logistic regression, but that shouldn't matter, SAS handles class variables in model the same everywhere).
The answers to many questions like this are addressed in the SAS Notes which you can search at https://support.sas.com/en/knowledge-base.html . For this question, you would find this note that addresses your question.
Thank you everyone for your responses, I have found a solution to the problem.
Because I didn't specify the position of the param in the class declaration, SAS didn't know what to use as the reference group for maximum likelihood estimation. The corrected procedure is as follows.
proc logistic data=database;
class LT_RL_II(param=ref ref='0');
model Event_Variable(event='1')=LT_RL_II Gender Age PB_CII Dept_II Sleep/risklimits;
run;
After rerunning the procedure, the results are consistent with those obtained by manually setting up dummy variables in Figure 2.
Thank you everyone for your guidance and suggestions.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.