Solved: Logistic regression: p-value estimate error

FefeChen · Posted 01-31-2024 02:29 AM

Hello,

I used the "class" keyword to declare it as a categorical variable, but the results of the SAS calculation were quite strange, especially the estimated p-value. Like this.

Code:

proc logistic data=database;
class LT_RL_II(ref='0');
model Event_Variable(event='1')=LT_RL_II Gender Age PB_CII Dept_II Sleep/risklimits;
run;

But I got the tricky results. I found that the estimated value of the variable "LT_RL_II 3 v.s. 0" is not significant under the 95% confidence interval estimate (OR: 3.085 95%CI: 0.956-9.952), but in the above p-value estimate (p=0.0384). It shows significant (p<0.05).

However, I used another approach, which divided categrocical data into dunny variable. The problem had been solved.

Transfor Code

if LT_RL_II= 0 then do ; g1=0; g2=0; g3=0; end;
if LT_RL_II= 1 then do ; g1=1; g2=0; g3=0; end;
if LT_RL_II= 2 then do ; g1=0; g2=1; g3=0; end;
if LT_RL_II= 3 then do ; g1=0; g2=0; g3=1; end; run;

After using dunny variable, the OR estimate is consistent with the method using the "Class" keyword, but the pvalue changes(pvalue:0.0594).

In SAS regression syntax, when using the CLASS statement to declare categorical variables and automatically generate corresponding dummy variables, why the results in the maximum likelihood estimation table are inconsistent with manually generated ones (Figure 2, g1 (LT_TL_II level 1 vs. 0), g2 (LT_TL_II level 2 vs. 0), g3 (LT_TL_II level 3 vs. 0))?

Interesting is that, apart from the maximum likelihood estimation table, the results of the two approaches are consistent in other aspects (OR estimation). Upon closer examination, the odds ratio (OR) estimates in Figure 1 should theoretically be non-significant, but they appear as significant variables in the maximum likelihood estimation. Conversely, the results in Figure 2 provide reasonable p-values (non-significant).

This confuses me. Does SAS use two methods to estimate pvalue when executing this job?

My Hardware

cpu:i5-10350U

RAM:4G

SAS version: 9.4 at D5

StatDave · Posted 01-31-2024 11:27 AM

The answers to many questions like this are addressed in the SAS Notes which you can search at https://support.sas.com/en/knowledge-base.html . For this question, you would find this note that addresses your question.

View solution in original post

PaigeMiller · Posted 01-31-2024 05:32 AM

I'm am unable to read the Chinese characters, but I think this is what is happening.

The first tests whether the effect of LT_RL_II level 3 is equal to the value zero. This is not the same test as the odds ratio test below, which is comparing level 3 to level 0 (the reference level), and so this confidence interval includes 1, indicating no difference between level 3 and level 0.

--
Paige Miller

FefeChen · Posted 01-31-2024 06:59 AM

Thank you for your response. I'm curious about the discrepancy in p-value estimation between using the CLASS statement and manually creating dummy variables. In theory, when declaring a class variable, SAS should automatically generate corresponding dummy variables (similar to manually setting up g1/g2/g3, like figure 2), and the results generated should be consistent with manual conversion (whether it's OR estimation or p-value). However, there seems to be a discrepancy in the test results.

FreelanceReinh · Posted 01-31-2024 08:14 AM

Hello @FefeChen,

I suspect* that you created the dummy variables like this:

g1=(LT_RL_II=1);
g2=(LT_RL_II=2);
g3=(LT_RL_II=3);

This would correspond to reference cell coding (see documentation). However, your PROC LOGISTIC code uses the default parameterization, effect coding. So, to replicate this you should define

g1=(LT_RL_II=1)-(LT_RL_II=0);
g2=(LT_RL_II=2)-(LT_RL_II=0);
g3=(LT_RL_II=3)-(LT_RL_II=0);

With this definition I would expect that all discrepancies (including that about p-values vs. odds ratio confidence intervals) will disappear.

* EDIT: I hadn't seen the update to your post, which confirms that you created the dummy variables that way.

PaigeMiller · Posted 01-31-2024 08:20 AM

I believe @FreelanceReinh is correct. But just to add one more point to his statement, when you have class variables, the solution (the regression coefficients) are not unique. There are many many (actually infinite) number of solutions for the regression coefficients, that are all equivalent and produce the identical model. I wrote an article about this (it wasn't about logistic regression, but that shouldn't matter, SAS handles class variables in model the same everywhere).

--
Paige Miller

StatDave · Posted 01-31-2024 11:27 AM

The answers to many questions like this are addressed in the SAS Notes which you can search at https://support.sas.com/en/knowledge-base.html . For this question, you would find this note that addresses your question.

FefeChen · Posted 02-01-2024 02:41 AM

Thank you everyone for your responses, I have found a solution to the problem.

Because I didn't specify the position of the param in the class declaration, SAS didn't know what to use as the reference group for maximum likelihood estimation. The corrected procedure is as follows.

proc logistic data=database;
class LT_RL_II(param=ref ref='0');
model Event_Variable(event='1')=LT_RL_II Gender Age PB_CII Dept_II Sleep/risklimits;
run;

After rerunning the procedure, the results are consistent with those obtained by manually setting up dummy variables in Figure 2.

Thank you everyone for your guidance and suggestions.

Logistic regression: p-value estimate error

Re: Logistic regression: p-value estimate error

Re: Logistic regression: p-value estimate error

Re: Logistic regression: p-value estimate error

Re: Logistic regression: p-value estimate error

Re: Logistic regression: p-value estimate error

Re: Logistic regression: p-value estimate error

Re: Logistic regression: p-value estimate error

Registration is open