I am doing a logistic regression: Y = Treatment group + Covariate1 + Covariate2 ..+Covariate4.
Y is a binomial outcome variable: patient response, patient non-response.
Treatment group is the factor I am interested in, which has Group A and Group control.
The other 4 covariates are all categorical, some has two levels, some has more. For those covariates with more than 2 levels, dummy covariates are used.
Each patient has only one observation, that is, no repeated measures.
I want to get such results from the model: Odds ratio of treatment group, and its 95% CI, p-value, and the response rate in each treatment group adjusted for the covariates.
Regarding OR, CI and p-value, I get the same results from PROC LOGISTIC and PROC GENMOD.
The problem is the response rate in each treatment group adjusted for the covariates.
My questions are:
1. I don't find any option in both PROCs to provide such an adjusted response rate for both treatment groups. Is anybody know such options?
2. I calculated the adjusted response rate in this way:
firstly, I get predicted probability(p) from the model for each patient;
then, I calculate the logit for each patient, which is log(p/(1-p)) ;
thirdly, I calculate the mean logit in both treatment groups;
fourthly, derive the adjusted response rate of each treatment group = exp(MeanLogit) / (1+exp(MeanLogit) )
I am not sure if my calculation steps are correct. Can anyone give me comments?
3. The predicted probability from PROC LOGISTIC and PROC GENMOD are totally different, even not on the same magnitude. I don't understand. Can anyone explain?
Therefore, my above calculated adjusted response rates from both models are very different. I don't know which one to trust.
Many thanks in advance for any help!!!
1. Why are you calculating predicted probability by hand? Are the predicted values outputed from the procs different?
2. Are you sure that's the correct method to calculate the response rate? Have you looked at effectplot and/or estimate statements?
Hi Reeza, thanks for your reply!
1. I don't calculate predicted probability by hand. I mean I get it from the output. They are different from PROC LOGISTIC and PROC GENMOD, although the estimated odds ratios, CI, p-values from both procs are the same.
2. I am not sure if my method is correct, that's why I ask here. What do you mean effectplot? Could you explain more detail?
It would be helpful to show some code.
I suspect that the problem stems from your coding of categorical covariates as dummy variables. You should use CLASS variables instead. CLASS effects and dummy (continuous) variables are not treated the same way in LSMEANS calculations.
Hi PG, thank you very much for your reply. I am not coding the covariates as dummy variables, I think the PROCs treated the covariates as dummy variables since I did put them in CLASS statement.
Here is my code:
proc logistic data = DATAIN descending ;
class ARM COV1 COV2 COV3 ;
model AVAL = ARM COV1 COV2 COV3;
oddsratio ARM ;
lsmeans ARM / e diff oddsratio cl ;
ods output ParameterEstimates = ESTIMSTE_
Type3 = TYPE3_
OddsRatiosWald = OR_
;
output out = PRED predicted = phat ;
run ;
proc genmod data = DATAIN. descending ;
class ARM COV1 COV2 COV3 ;
model AVAL = ARM COV1 COV2 COV3 / dist = bin link = logit type3 lrci ;
lsmeans ARM / cl oddsratio diff ;
ods output
Type3 = TYPE3_
DIFFS = DIFF_
;
output out = PRED1 predicted = phat1 ;
run ;
The estimates (OR, p-value, Confidence interval of OR) are the same from the two PROCs. But the predicted prabablity, which are the output datasets phat and phat1 are very different.
Do you know how can I get the adjusted response rate for both treatment groups (ARM) ?
Xueping
You've verified that the design matrix is the same for both procedures?
And the log,is.clean for both Procs?
Hi Reeza, many thanks!
the design matrix is the same in both PROCs. But the log is not clean since my dataset is not large enough. I think if I get more data, the convegence will be ok. Do you think it is the reseaon? Does it mean if I get larger dataset, then the predicted probabilities in both PROCs will be the same? But it is strange that the OR estimates are the same in both PROCs even though there are warnings from both.
log from PROC LOGISTIC:
NOTE: PROC LOGISTIC is modeling the probability that AVAL='Response'.
WARNING: There is possibly a quasi-complete separation of data points. The maximum likelihood estimate may not exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable.
WARNING: The model does not have a GLM parameterization. This parameterization is required for the LSMEANS, LSMESTIMATE, and SLICE statement. These statements are ignored.
log from PROC GENMOD:
NOTE: PROC GENMOD is modeling the probability that AVAL='Response'.
WARNING: The negative of the Hessian is not positive definite. The convergence is questionable.
WARNING: The procedure is continuing but the validity of the model fit is questionable.
WARNING: The specified model did not converge.
NOTE: The Pearson chi-square and deviance are not computed since the AGGREGATE option is not specified.
WARNING: Negative of Hessian not positive definite.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.