Re: PROC MIANALYZE after PROC LOGISTIC results in extreme Odds Ratios

Arik · Posted 03-20-2020 05:10 AM

Hi,

I'm analyzing a dataset including biomarker data. Due to missing values the dataset was imputed (No. of imputations=20).

First, I performed a logistic regression on only one of the imputations. Therefore, I had to include the UNITS option, in order to calculate Odds Ratios (ORs) per 1 SD increase (otherwise the ORs resulted in extreme values like >999.99). Then, everything worked fine (cf. code below).

%macro biom_assoc (biom_trans=);

proc logistic data=mi_olink_single outest=outcoxreg1 covout;
model i_dem (event='1') = age_cat0 age_cat1 p02sex educ_cat0 educ_cat1 active_cat0 active_cat1 bmi_cat0 bmi_cat1
p_cvd p_diab depr_cat0 depr_cat1 apoe_cat0 apoe_cat1 apoe_cat2 apoe_cat4 apoe_cat5
&biom_trans.;
oddsratio &biom_trans.;
UNITS &biom_trans.=SD;
run;

%mend;

After this, I tried to perform the analysis based on all of the 20 imputations in the dataset. However, this resulted again in extreme values for the ORs: 2.95E-12 (2.68E-56 - 2.246E32) (OR (95% CI)).

%macro biom_assoc (biom_trans=);

proc logistic data=mi_olink outest=outcoxreg1 covout;
by _imputation_;
model i_dem (event='1') = age_cat0 age_cat1 p02sex educ_cat0 educ_cat1 active_cat0 active_cat1 bmi_cat0 bmi_cat1
p_cvd p_diab depr_cat0 depr_cat1 apoe_cat0 apoe_cat1 apoe_cat2 apoe_cat4 apoe_cat5
&biom_trans.;
UNITS &biom_trans.=SD;
run;


ods output Mianalyze.ParameterEstimates = tab32.ps_&biom_trans._all;
proc MIANALYZE data=outcoxreg1;
modeleffects age_cat0 age_cat1 p02sex educ_cat0 educ_cat1 active_cat0 active_cat1 bmi_cat0 bmi_cat1
p_cvd p_diab depr_cat0 depr_cat1 apoe_cat0 apoe_cat1 apoe_cat2 apoe_cat4 apoe_cat5
&biom_trans.;
ods output ParameterEstimates=parmsdat;
run;


ods output SQL.SQL_Results = tab32.ORs_&biom_trans._all;
proc sql;
select parm as name, exp(estimate) as OR,
exp(LCLMean) as LCI_OR,
exp(UCLMean) as UCI_OR
from parmsdat;
quit;

%mend biom_assoc;

Can anyone tell me what's going wrong here?

Thanks!

Rick_SAS · Posted 03-20-2020 08:50 AM

If I understand correctly, one (or more) of the BY groups is generating extreme OR. I suggest you determine which BY group is responsible and then look at the imputed values to see what is going on.

Other comments:

1. There is nothing intrinsically wrong with having an extreme OR. It just means that the probability of the event occurring for one group is much much greater than in another group. For example, the odds of breast cancer in women is much much greater than in men.

2. I notice that you have generated dummy variables instead of using a CLASS variable. Is there a reason for that?

Arik · Posted 03-20-2020 09:52 AM

Hi Rick,

thanks for your answer!

I checked the BY groups again. In all of the 20 imputations I'm getting the same kind of results: for all of the variables except the biomarker, the results for the OR are "normal". Only in case of the biomarker I get these extreme ORs (see below).


Effect	Point estimate	95% Wald Confidence limits

age_cat0	3.928	2.707	5.699
age_cat1	8.567	5.816	12.619
P02SEX	1.383	0.998	1.918
educ_cat0	0.848	0.521	1.381
educ_cat1	0.828	0.499	1.376
active_cat0	0.509	0.346	0.749
active_cat1	0.502	0.325	0.774
bmi_cat0	0.763	0.525	1.109
bmi_cat1	0.87	0.564	1.341
p_cvd	1.154	0.812	1.639
p_diab	1.609	1.09	2.376
depr_cat0	0.995	0.581	1.707
depr_cat1	1.503	0.674	3.35
apoe_cat0	0.54	0.097	3.012
apoe_cat1	1.524	0.974	2.386
apoe_cat2	2.831	1.222	6.56
apoe_cat4	2.05	1.429	2.941
apoe_cat5	14.017	5.24	37.497
mi2_uPA	<0.001	<0.001	>999.999

One comment to the multiple imputation: The biomarker data had no missings! The dataset was imputed because of missings in other variables.

To your other comments:
I agree with you that an extreme OR is not intrinsically wrong, but after I got "normal" ORs in the analysis based on a dataset including only one of the 20 imputations and I'm getting such extreme values for the whole dataset, this sets off my alarm bells.
There was no special reason for using dummy variables instead of the class statment.

SAS_Rob · Posted 03-20-2020 10:04 AM

It would be helpful to see the LOG from both the Proc MI and Proc LOGISTIC steps as well. I suspect that there may be an issue with separation related to the biomarker variable. This usage note will help to explain separation if you are not sure what it is and what to do about it.

https://support.sas.com/kb/22/599.html

Arik · Posted 03-20-2020 10:23 AM

Thanks for your answer Rob! I'm not sure, if I totally understood it, but here's the log which is produced by running the code:

NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=1
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=2
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=3
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=4
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=5
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=6
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=7
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=8
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=9
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=10
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=11
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=12
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=13
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=14
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=15
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=16
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=17
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=18
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=19
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=20
NOTE: There were 25160 observations read from the data set WORK.MI_OLINK.
NOTE: The data set WORK.OUTCOXREG1 has 420 observations and 27 variables.
NOTE: PROZEDUR LOGISTIC used (Total process time):
      real time           0.82 seconds
      cpu time            0.81 seconds


NOTE: The data set WORK.PARMSDAT has 19 observations and 11 variables.
NOTE: PROZEDUR MIANALYZE used (Total process time):
      real time           0.04 seconds
      cpu time            0.07 seconds


NOTE: PROZEDUR SQL used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

Arik · Posted 03-27-2020 10:57 AM

Let me put the question differently:

How would one conceptualise PROC MIANALYZE for this peace of code?:

proc logistic data=mi_olink outest=outcoxreg1 covout;
by _imputation_;
model i_dem (event='1') = age_cat0 age_cat1 p02sex educ_cat0 educ_cat1 active_cat0 active_cat1 bmi_cat0 bmi_cat1
p_cvd p_diab depr_cat0 depr_cat1 apoe_cat0 apoe_cat1 apoe_cat2 apoe_cat4 apoe_cat5
&biom_trans.;
UNITS &biom_trans.=SD;
run;

SAS_Rob · Posted 03-27-2020 11:53 AM

Since you have the OUTEST= data set, you would use the DATA= option in MIANALYZE.

proc mianalyze data=outcoxreg1;
modeleffects age_cat0 age_cat1 p02sex educ_cat0 educ_cat1 active_cat0 active_cat1 bmi_cat0 bmi_cat1
p_cvd p_diab depr_cat0 depr_cat1 apoe_cat0 apoe_cat1 apoe_cat2 apoe_cat4 apoe_cat5
&biom_trans.;
run;

If you are still interested in figuring out why the estimates are so large then I would suggest you check the imputation models to make sure nothing strange was going on with them.

SAS_Rob · Posted 03-27-2020 12:14 PM

After thinking about this a little more, I am curious about your comment which I initially missed regarding the odds ratio only being reasonable when you report it in standard deviation units. I am wondering about the distribution of that particular variable. Are the values really large or really small and how big exactly is the standard deviation?

Take a look at the summary statistics for that variable after the imputation (maybe a Proc MEANS with a BY statement) and make sure they look correct. Again I would check the convergence of your Proc MI code (you can post the LOG if you have any questions).

You could also try standardizing that variable, especially if it has extreme values or extreme variation and see if you get more meaningful results.