Hi there -
I am running a multiple imputation and multiple logistic regression for the first time since grad school (about 10 years ago). I'm not able to provide a sample dataset, but I was hoping someone could take a quick look at the code below and let me know if I'm on the right track. The code runs with no errors, and the results make sense. For background - I'm comparing maternal risk factors of syphilis exposed infants who result in a congenital syphilis (CS) case, and those who result in an averted CS case. My sample size is 2315 and the data is fairly evenly split between averted and CS cases. Mother_county and momcounty_cat are different variables measuring different things.
/* PART 2 */
/**************************************** Multiple Imputation and Logistic Regression - Added 4/27/2026*******************************************************************/
/*---------------------------------------------------------
Step 1: Recode Unknown/Missing valuesfor variables that WILL be imputed
---------------------------------------------------------*/
data analysis_mi;
set days_treat_cat;
/* Character variables */
array mi_charvars {*}
Mother_education
PrimaryPayment
Mother_employment
Mother_Housing
Mother_WIC
IncarceratedWithinLast12Months
binationalmom
PriorSTI
Char_DrugUseAll
mother_county
Mother_HIV_Status_18;
do i = 1 to dim(mi_charvars);
if strip(mi_charvars{i}) in
('Unknown', 'Missing', 'Unknown/Missing', 'UNK', '', 'U', 'Unknown')
then mi_charvars{i} = '';
end;
/* Numeric variable */
if IndexSum in (.) then IndexSum = .;
drop i;
run;
/*---------------------------------------------------------
Step 2: Recode Unknown to missing for low-missing variables
that are NOT being imputed, then drop missing rows
These are not imputed because missingness is low.
---------------------------------------------------------*/
data analysis_mi2;
set analysis_mi;
if mom_race_eth_simp = 'Unknown' then mom_race_eth_simp = '';
if no_livebirths_cat = 'Unknown' then no_livebirths_cat = '';
if maxtiter_group = 'Unknown' then maxtiter_group = '';
if missing(mom_race_eth_simp) then delete;
if missing(no_livebirths_cat) then delete;
if missing(maxtiter_group) then delete;
run;
/*Check analytic sample size before MI - Sample size is n=2315 on 4/27/2026*/
proc freq data=analysis_mi2;
tables BabyDisease;
run;
/*---------------------------------------------------------
Step 3: Multiple imputation
FCS discriminant method works well for categorical variables.
BabyDisease is included as a predictor in the imputation model.
---------------------------------------------------------*/
proc mi data=analysis_mi2
out=mi_data
nimpute=20
seed=12345;
class
BabyDisease
agegroup
mom_race_eth_simp
mother_county
Momcounty_cat
mom_simp_stage
maxtiter_group
Mother_HIV_Status_18
IndexSum
no_livebirths_cat
char_druguseall
PriorSTI
binationalmom
Mother_education
PrimaryPayment
Mother_employment
Mother_Housing
Mother_WIC
IncarceratedWithinLast12Months;
fcs
discrim(
Mother_education
PrimaryPayment
Mother_employment
Mother_Housing
mother_county
Mother_HIV_Status_18
IndexSum
/ classeffects=include
)
logistic(binationalmom / likelihood=augment)
logistic(Mother_WIC / likelihood=augment)
logistic(IncarceratedWithinLast12Months / likelihood=augment)
logistic(PriorSTI / likelihood=augment)
logistic(char_druguseall / likelihood=augment);
var
BabyDisease
agegroup
mom_race_eth_simp
mother_county
Momcounty_cat
mom_simp_stage
maxtiter_group
Mother_HIV_Status_18
IndexSum
no_livebirths_cat
char_druguseall
PriorSTI
binationalmom
Mother_education
PrimaryPayment
Mother_employment
Mother_Housing
Mother_WIC
IncarceratedWithinLast12Months;
run;
/*---------------------------------------------------------
Step 4: /* Prepare logistic regression parameter estimates for MIANALYZE
---------------------------------------------------------*/
proc logistic data=mi_data descending;
by _Imputation_;
class
agegroup (ref='c_23-27')
mom_race_eth_simp (ref='White')
mother_county (ref='United States')
Momcounty_cat (ref='Maricopa, Pima or Pinal Counties')
mom_simp_stage (ref='Late-Latent')
maxtiter_group (ref='a_Low Maximum Titer')
Mother_HIV_Status_18 (ref='Negative')
IndexSum (ref='4')
no_livebirths_cat (ref='0-2 Live births')
char_druguseall (ref='No')
PriorSTI (ref='N')
binationalmom (ref='No')
Mother_education (ref='High school graduate or GED completed')
PrimaryPayment (ref='AHCCSS')
Mother_employment (ref='Employed full/part t')
Mother_Housing (ref='Stably Housed (e.g. lives in consistent housing)')
Mother_WIC (ref='No')
IncarceratedWithinLast12Months (ref='N')
/ param=ref;
ods output ParameterEstimates=logistic_parms;
model BabyDisease =
agegroup
mom_race_eth_simp
mother_county
Momcounty_cat
mom_simp_stage
maxtiter_group
Mother_HIV_Status_18
IndexSum
no_livebirths_cat
char_druguseall
PriorSTI
binationalmom
Mother_education
PrimaryPayment
Mother_employment
Mother_Housing
Mother_WIC
IncarceratedWithinLast12Months;
run;
data parms_for_mianalyze;
set logistic_parms;
length Effect $200;
if Variable = "Intercept" then Effect = "Intercept";
else if ClassVal0 ne "" then Effect = catx(" = ", Variable, ClassVal0);
else Effect = Variable;
keep _Imputation_ Effect Estimate StdErr;
run;
proc sort data=parms_for_mianalyze;
by Effect _Imputation_;
run;
/* Pool each regression coefficient across imputations */
proc mianalyze data=parms_for_mianalyze;
by Effect;
modeleffects Estimate;
stderr StdErr;
ods output ParameterEstimates=pooled_results;
run;
/* Convert pooled log odds to adjusted odds ratios */
data final_aor;
set pooled_results;
AOR = exp(Estimate);
LowerCL_AOR = exp(LCLMean);
UpperCL_AOR = exp(UCLMean);
run;
proc print data=final_aor noobs;
var Effect Estimate StdErr AOR LowerCL_AOR UpperCL_AOR Probt;
run;
It seems like the code is correct. The last data step is necessary in order to get the combined odds ratios and confidence limits. What is not needed is the PARMS_FOR_MIANALYZE step. I suspect it will still give you the right answers, but you can take the data set as is coming from LOGISTIC and read it directly into MIANALYZE.
proc mianalyze parms(classvar=classval)=logistic_parms;
class agegroup
mom_race_eth_simp
mother_county
Momcounty_cat
mom_simp_stage
maxtiter_group
Mother_HIV_Status_18
IndexSum
no_livebirths_cat
char_druguseall
PriorSTI
binationalmom
Mother_education
PrimaryPayment
Mother_employment
Mother_Housing
Mother_WIC
IncarceratedWithinLast12Months;
modeleffects intercept agegroup
mom_race_eth_simp
mother_county
Momcounty_cat
mom_simp_stage
maxtiter_group
Mother_HIV_Status_18
IndexSum
no_livebirths_cat
char_druguseall
PriorSTI
binationalmom
Mother_education
PrimaryPayment
Mother_employment
Mother_Housing
Mother_WIC
IncarceratedWithinLast12Months;
ods output ParameterEstimates=pooled_results;
run;
It seems like the code is correct. The last data step is necessary in order to get the combined odds ratios and confidence limits. What is not needed is the PARMS_FOR_MIANALYZE step. I suspect it will still give you the right answers, but you can take the data set as is coming from LOGISTIC and read it directly into MIANALYZE.
proc mianalyze parms(classvar=classval)=logistic_parms;
class agegroup
mom_race_eth_simp
mother_county
Momcounty_cat
mom_simp_stage
maxtiter_group
Mother_HIV_Status_18
IndexSum
no_livebirths_cat
char_druguseall
PriorSTI
binationalmom
Mother_education
PrimaryPayment
Mother_employment
Mother_Housing
Mother_WIC
IncarceratedWithinLast12Months;
modeleffects intercept agegroup
mom_race_eth_simp
mother_county
Momcounty_cat
mom_simp_stage
maxtiter_group
Mother_HIV_Status_18
IndexSum
no_livebirths_cat
char_druguseall
PriorSTI
binationalmom
Mother_education
PrimaryPayment
Mother_employment
Mother_Housing
Mother_WIC
IncarceratedWithinLast12Months;
ods output ParameterEstimates=pooled_results;
run;
Thank you!
Dive into keynotes, announcements and breakthroughs on demand.
Explore Now →ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.