Hi,
I'm analyzing a dataset including biomarker data. Due to missing values the dataset was imputed (No. of imputations=20).
First, I performed a logistic regression on only one of the imputations. Therefore, I had to include the UNITS option, in order to calculate Odds Ratios (ORs) per 1 SD increase (otherwise the ORs resulted in extreme values like >999.99). Then, everything worked fine (cf. code below).
%macro biom_assoc (biom_trans=);
proc logistic data=mi_olink_single outest=outcoxreg1 covout;
model i_dem (event='1') = age_cat0 age_cat1 p02sex educ_cat0 educ_cat1 active_cat0 active_cat1 bmi_cat0 bmi_cat1
p_cvd p_diab depr_cat0 depr_cat1 apoe_cat0 apoe_cat1 apoe_cat2 apoe_cat4 apoe_cat5
&biom_trans.;
oddsratio &biom_trans.;
UNITS &biom_trans.=SD;
run;
%mend;
After this, I tried to perform the analysis based on all of the 20 imputations in the dataset. However, this resulted again in extreme values for the ORs: 2.95E-12 (2.68E-56 - 2.246E32) (OR (95% CI)).
%macro biom_assoc (biom_trans=);
proc logistic data=mi_olink outest=outcoxreg1 covout;
by _imputation_;
model i_dem (event='1') = age_cat0 age_cat1 p02sex educ_cat0 educ_cat1 active_cat0 active_cat1 bmi_cat0 bmi_cat1
p_cvd p_diab depr_cat0 depr_cat1 apoe_cat0 apoe_cat1 apoe_cat2 apoe_cat4 apoe_cat5
&biom_trans.;
UNITS &biom_trans.=SD;
run;
ods output Mianalyze.ParameterEstimates = tab32.ps_&biom_trans._all;
proc MIANALYZE data=outcoxreg1;
modeleffects age_cat0 age_cat1 p02sex educ_cat0 educ_cat1 active_cat0 active_cat1 bmi_cat0 bmi_cat1
p_cvd p_diab depr_cat0 depr_cat1 apoe_cat0 apoe_cat1 apoe_cat2 apoe_cat4 apoe_cat5
&biom_trans.;
ods output ParameterEstimates=parmsdat;
run;
ods output SQL.SQL_Results = tab32.ORs_&biom_trans._all;
proc sql;
select parm as name, exp(estimate) as OR,
exp(LCLMean) as LCI_OR,
exp(UCLMean) as UCI_OR
from parmsdat;
quit;
%mend biom_assoc;
Can anyone tell me what's going wrong here?
Thanks!
If I understand correctly, one (or more) of the BY groups is generating extreme OR. I suggest you determine which BY group is responsible and then look at the imputed values to see what is going on.
Other comments:
1. There is nothing intrinsically wrong with having an extreme OR. It just means that the probability of the event occurring for one group is much much greater than in another group. For example, the odds of breast cancer in women is much much greater than in men.
2. I notice that you have generated dummy variables instead of using a CLASS variable. Is there a reason for that?
Hi Rick,
thanks for your answer!
I checked the BY groups again. In all of the 20 imputations I'm getting the same kind of results: for all of the variables except the biomarker, the results for the OR are "normal". Only in case of the biomarker I get these extreme ORs (see below).
Effect | Point estimate | 95% Wald Confidence limits | |
age_cat0 | 3.928 | 2.707 | 5.699 |
age_cat1 | 8.567 | 5.816 | 12.619 |
P02SEX | 1.383 | 0.998 | 1.918 |
educ_cat0 | 0.848 | 0.521 | 1.381 |
educ_cat1 | 0.828 | 0.499 | 1.376 |
active_cat0 | 0.509 | 0.346 | 0.749 |
active_cat1 | 0.502 | 0.325 | 0.774 |
bmi_cat0 | 0.763 | 0.525 | 1.109 |
bmi_cat1 | 0.87 | 0.564 | 1.341 |
p_cvd | 1.154 | 0.812 | 1.639 |
p_diab | 1.609 | 1.09 | 2.376 |
depr_cat0 | 0.995 | 0.581 | 1.707 |
depr_cat1 | 1.503 | 0.674 | 3.35 |
apoe_cat0 | 0.54 | 0.097 | 3.012 |
apoe_cat1 | 1.524 | 0.974 | 2.386 |
apoe_cat2 | 2.831 | 1.222 | 6.56 |
apoe_cat4 | 2.05 | 1.429 | 2.941 |
apoe_cat5 | 14.017 | 5.24 | 37.497 |
mi2_uPA | <0.001 | <0.001 | >999.999 |
One comment to the multiple imputation: The biomarker data had no missings! The dataset was imputed because of missings in other variables.
To your other comments:
I agree with you that an extreme OR is not intrinsically wrong, but after I got "normal" ORs in the analysis based on a dataset including only one of the 20 imputations and I'm getting such extreme values for the whole dataset, this sets off my alarm bells.
There was no special reason for using dummy variables instead of the class statment.
It would be helpful to see the LOG from both the Proc MI and Proc LOGISTIC steps as well. I suspect that there may be an issue with separation related to the biomarker variable. This usage note will help to explain separation if you are not sure what it is and what to do about it.
Thanks for your answer Rob! I'm not sure, if I totally understood it, but here's the log which is produced by running the code:
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=1
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=2
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=3
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=4
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=5
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=6
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=7
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=8
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=9
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=10
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=11
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=12
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=13
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=14
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=15
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=16
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=17
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=18
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=19
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
Imputationsnummer=20
NOTE: There were 25160 observations read from the data set WORK.MI_OLINK.
NOTE: The data set WORK.OUTCOXREG1 has 420 observations and 27 variables.
NOTE: PROZEDUR LOGISTIC used (Total process time):
real time 0.82 seconds
cpu time 0.81 seconds
NOTE: The data set WORK.PARMSDAT has 19 observations and 11 variables.
NOTE: PROZEDUR MIANALYZE used (Total process time):
real time 0.04 seconds
cpu time 0.07 seconds
NOTE: PROZEDUR SQL used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
Let me put the question differently:
How would one conceptualise PROC MIANALYZE for this peace of code?:
proc logistic data=mi_olink outest=outcoxreg1 covout;
by _imputation_;
model i_dem (event='1') = age_cat0 age_cat1 p02sex educ_cat0 educ_cat1 active_cat0 active_cat1 bmi_cat0 bmi_cat1
p_cvd p_diab depr_cat0 depr_cat1 apoe_cat0 apoe_cat1 apoe_cat2 apoe_cat4 apoe_cat5
&biom_trans.;
UNITS &biom_trans.=SD;
run;
Since you have the OUTEST= data set, you would use the DATA= option in MIANALYZE.
proc mianalyze data=outcoxreg1;
modeleffects age_cat0 age_cat1 p02sex educ_cat0 educ_cat1 active_cat0 active_cat1 bmi_cat0 bmi_cat1
p_cvd p_diab depr_cat0 depr_cat1 apoe_cat0 apoe_cat1 apoe_cat2 apoe_cat4 apoe_cat5
&biom_trans.;
run;
If you are still interested in figuring out why the estimates are so large then I would suggest you check the imputation models to make sure nothing strange was going on with them.
After thinking about this a little more, I am curious about your comment which I initially missed regarding the odds ratio only being reasonable when you report it in standard deviation units. I am wondering about the distribution of that particular variable. Are the values really large or really small and how big exactly is the standard deviation?
Take a look at the summary statistics for that variable after the imputation (maybe a Proc MEANS with a BY statement) and make sure they look correct. Again I would check the convergence of your Proc MI code (you can post the LOG if you have any questions).
You could also try standardizing that variable, especially if it has extreme values or extreme variation and see if you get more meaningful results.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.