BookmarkSubscribeRSS Feed
Arik
Fluorite | Level 6

Hi,

I'm analyzing a dataset including biomarker data. Due to missing values the dataset was imputed (No. of imputations=20).

First, I performed a logistic regression on only one of the imputations. Therefore, I had to include the UNITS option, in order to calculate Odds Ratios (ORs) per 1 SD increase (otherwise the ORs resulted in extreme values like >999.99). Then, everything worked fine (cf. code below).

 

%macro biom_assoc (biom_trans=);

proc logistic data=mi_olink_single outest=outcoxreg1 covout;
model i_dem (event='1') = age_cat0 age_cat1 p02sex educ_cat0 educ_cat1 active_cat0 active_cat1 bmi_cat0 bmi_cat1
p_cvd p_diab depr_cat0 depr_cat1 apoe_cat0 apoe_cat1 apoe_cat2 apoe_cat4 apoe_cat5
&biom_trans.;
oddsratio &biom_trans.;
UNITS &biom_trans.=SD;
run;

%mend;

 

After this, I tried to perform the analysis based on all of the 20 imputations in the dataset. However, this resulted again in extreme values for the ORs: 2.95E-12 (2.68E-56 - 2.246E32) (OR (95% CI)).

 

%macro biom_assoc (biom_trans=);

proc logistic data=mi_olink outest=outcoxreg1 covout;
by _imputation_;
model i_dem (event='1') = age_cat0 age_cat1 p02sex educ_cat0 educ_cat1 active_cat0 active_cat1 bmi_cat0 bmi_cat1
p_cvd p_diab depr_cat0 depr_cat1 apoe_cat0 apoe_cat1 apoe_cat2 apoe_cat4 apoe_cat5
&biom_trans.;
UNITS &biom_trans.=SD;
run;


ods output Mianalyze.ParameterEstimates = tab32.ps_&biom_trans._all;
proc MIANALYZE data=outcoxreg1;
modeleffects age_cat0 age_cat1 p02sex educ_cat0 educ_cat1 active_cat0 active_cat1 bmi_cat0 bmi_cat1
p_cvd p_diab depr_cat0 depr_cat1 apoe_cat0 apoe_cat1 apoe_cat2 apoe_cat4 apoe_cat5
&biom_trans.;
ods output ParameterEstimates=parmsdat;
run;


ods output SQL.SQL_Results = tab32.ORs_&biom_trans._all;
proc sql;
select parm as name, exp(estimate) as OR,
exp(LCLMean) as LCI_OR,
exp(UCLMean) as UCI_OR
from parmsdat;
quit;

%mend biom_assoc;

Can anyone tell me what's going wrong here?

Thanks!

 
 
7 REPLIES 7
Rick_SAS
SAS Super FREQ

If I understand correctly, one (or more) of the BY groups is generating extreme OR. I suggest you determine which BY group is responsible and then look at the imputed values to see what is going on. 

 

Other comments:

1. There is nothing intrinsically wrong with having an extreme OR. It just means that the probability of the event occurring for one group is much much greater than in another group. For example, the odds of breast cancer in women is much much greater than in men.

2. I notice that you have generated dummy variables instead of using a CLASS variable. Is there a reason for that?

Arik
Fluorite | Level 6

Hi Rick,

thanks for your answer!

I checked the BY groups again. In all of the 20 imputations I'm getting the same kind of results: for all of the variables except the biomarker, the results for the OR are "normal". Only in case of the biomarker I get these extreme ORs (see below).

 
EffectPoint
estimate

95% Wald

Confidence limits

 
age_cat03.9282.7075.699
age_cat18.5675.81612.619
P02SEX1.3830.9981.918
educ_cat00.8480.5211.381
educ_cat10.8280.4991.376
active_cat00.5090.3460.749
active_cat10.5020.3250.774
bmi_cat00.7630.5251.109
bmi_cat10.870.5641.341
p_cvd1.1540.8121.639
p_diab1.6091.092.376
depr_cat00.9950.5811.707
depr_cat11.5030.6743.35
apoe_cat00.540.0973.012
apoe_cat11.5240.9742.386
apoe_cat22.8311.2226.56
apoe_cat42.051.4292.941
apoe_cat514.0175.2437.497
mi2_uPA<0.001<0.001>999.999

 

One comment to the multiple imputation: The biomarker data had no missings! The dataset was imputed because of missings in other variables.

 

To your other comments:
I agree with you that an extreme OR is not intrinsically wrong, but after I got "normal" ORs in the analysis based on a dataset including only one of the 20 imputations and I'm getting such extreme values for the whole dataset, this sets off my alarm bells.
There was no special reason for using dummy variables instead of the class statment.

 
SAS_Rob
SAS Employee

It would be helpful to see the LOG from both the Proc MI and Proc LOGISTIC steps as well.  I suspect that there may be an issue with separation related to the biomarker variable.  This usage note will help to explain separation if you are not sure what it is and what to do about it.

https://support.sas.com/kb/22/599.html

Arik
Fluorite | Level 6

Thanks for your answer Rob! I'm not sure, if I totally understood it, but here's the log which is produced by running the code:

NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=1
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=2
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=3
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=4
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=5
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=6
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=7
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=8
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=9
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=10
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=11
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=12
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=13
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=14
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=15
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=16
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=17
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=18
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=19
NOTE: PROC LOGISTIC is modeling the probability that i_dem=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      Imputationsnummer=20
NOTE: There were 25160 observations read from the data set WORK.MI_OLINK.
NOTE: The data set WORK.OUTCOXREG1 has 420 observations and 27 variables.
NOTE: PROZEDUR LOGISTIC used (Total process time):
      real time           0.82 seconds
      cpu time            0.81 seconds


NOTE: The data set WORK.PARMSDAT has 19 observations and 11 variables.
NOTE: PROZEDUR MIANALYZE used (Total process time):
      real time           0.04 seconds
      cpu time            0.07 seconds


NOTE: PROZEDUR SQL used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

Arik
Fluorite | Level 6

Let me put the question differently:

How would one conceptualise PROC MIANALYZE for this peace of code?:

 

proc logistic data=mi_olink outest=outcoxreg1 covout;
by _imputation_;
model i_dem (event='1') = age_cat0 age_cat1 p02sex educ_cat0 educ_cat1 active_cat0 active_cat1 bmi_cat0 bmi_cat1
p_cvd p_diab depr_cat0 depr_cat1 apoe_cat0 apoe_cat1 apoe_cat2 apoe_cat4 apoe_cat5
&biom_trans.;
UNITS &biom_trans.=SD;
run;
SAS_Rob
SAS Employee

Since you have the OUTEST= data set, you would use the DATA= option in MIANALYZE.

 

proc mianalyze data=outcoxreg1;
modeleffects age_cat0 age_cat1 p02sex educ_cat0 educ_cat1 active_cat0 active_cat1 bmi_cat0 bmi_cat1
p_cvd p_diab depr_cat0 depr_cat1 apoe_cat0 apoe_cat1 apoe_cat2 apoe_cat4 apoe_cat5
&biom_trans.;
run;

 

If you are still interested in figuring out why the estimates are so large then I would suggest you check the imputation models to make sure nothing strange was going on with them.

SAS_Rob
SAS Employee

After thinking about this a little more, I am curious about your comment which I initially missed regarding the odds ratio only being reasonable when you report it in standard deviation units.  I am wondering about the distribution of that particular variable.  Are the values really large or really small and how big exactly is the standard deviation?

Take a look at the summary statistics for that variable after the imputation (maybe a Proc MEANS with a BY statement) and make sure they look correct.  Again I would check the convergence of your Proc MI code (you can post the LOG if you have any questions).  

 

You could also try standardizing that variable, especially if it has extreme values or extreme variation and see if you get more meaningful results.

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1254 views
  • 4 likes
  • 3 in conversation