BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
palacenorm
Calcite | Level 5

Hi there - 

 

I am running a multiple imputation and multiple logistic regression for the first time since grad school (about 10 years ago).  I'm not able to provide a sample dataset, but I was hoping someone could take a quick look at the code below and let me know if I'm on the right track.  The code runs with no errors, and the results make sense.  For background - I'm comparing maternal risk factors of syphilis exposed infants who result in a congenital syphilis (CS) case, and those who result in an averted CS case.  My sample size is 2315 and the data is fairly evenly split between averted and CS cases.  Mother_county and momcounty_cat are different variables measuring different things.  

 

/*   PART 2 */
/**************************************** Multiple Imputation and Logistic Regression - Added 4/27/2026*******************************************************************/

/*---------------------------------------------------------
Step 1: Recode Unknown/Missing valuesfor variables that WILL be imputed
---------------------------------------------------------*/

data analysis_mi;
    set days_treat_cat;

    /* Character variables */
    array mi_charvars {*} 
        Mother_education
        PrimaryPayment
        Mother_employment
        Mother_Housing
        Mother_WIC
        IncarceratedWithinLast12Months
        binationalmom
        PriorSTI
        Char_DrugUseAll
        mother_county
        Mother_HIV_Status_18;

    do i = 1 to dim(mi_charvars);
        if strip(mi_charvars{i}) in 
            ('Unknown', 'Missing', 'Unknown/Missing', 'UNK', '', 'U', 'Unknown') 
        then mi_charvars{i} = '';
    end;

    /* Numeric variable */
    if IndexSum in (.) then IndexSum = .;

    drop i;
run;



/*---------------------------------------------------------
Step 2: Recode Unknown to missing for low-missing variables
that are NOT being imputed, then drop missing rows

These are not imputed because missingness is low.
---------------------------------------------------------*/

data analysis_mi2;
    set analysis_mi;

    if mom_race_eth_simp = 'Unknown' then mom_race_eth_simp = '';
    if no_livebirths_cat = 'Unknown' then no_livebirths_cat = '';
    if maxtiter_group = 'Unknown' then maxtiter_group = '';

    if missing(mom_race_eth_simp) then delete;
    if missing(no_livebirths_cat) then delete;
    if missing(maxtiter_group) then delete;
run;


/*Check analytic sample size before MI - Sample size is n=2315 on 4/27/2026*/

proc freq data=analysis_mi2;
    tables BabyDisease;
run;



/*---------------------------------------------------------
Step 3: Multiple imputation
FCS discriminant method works well for categorical variables.
BabyDisease is included as a predictor in the imputation model.
---------------------------------------------------------*/

proc mi data=analysis_mi2
        out=mi_data
        nimpute=20
        seed=12345;

    class
        BabyDisease
        agegroup
        mom_race_eth_simp
        mother_county
        Momcounty_cat
        mom_simp_stage
        maxtiter_group
        Mother_HIV_Status_18
        IndexSum
        no_livebirths_cat
        char_druguseall
        PriorSTI
        binationalmom
        Mother_education
        PrimaryPayment
        Mother_employment
        Mother_Housing
        Mother_WIC
        IncarceratedWithinLast12Months;

    fcs
        discrim(
            Mother_education
            PrimaryPayment
            Mother_employment
            Mother_Housing
            mother_county
            Mother_HIV_Status_18
            IndexSum
            / classeffects=include
        )

        logistic(binationalmom / likelihood=augment)
    logistic(Mother_WIC / likelihood=augment)
    logistic(IncarceratedWithinLast12Months / likelihood=augment)
    logistic(PriorSTI / likelihood=augment)
    logistic(char_druguseall / likelihood=augment);

    var
        BabyDisease
        agegroup
        mom_race_eth_simp
        mother_county
        Momcounty_cat
        mom_simp_stage
        maxtiter_group
        Mother_HIV_Status_18
        IndexSum
        no_livebirths_cat
        char_druguseall
        PriorSTI
        binationalmom
        Mother_education
        PrimaryPayment
        Mother_employment
        Mother_Housing
        Mother_WIC
        IncarceratedWithinLast12Months;
run;

/*---------------------------------------------------------
Step 4: /* Prepare logistic regression parameter estimates for MIANALYZE
---------------------------------------------------------*/

proc logistic data=mi_data descending;
    by _Imputation_;

    class
        agegroup (ref='c_23-27')
        mom_race_eth_simp (ref='White')
        mother_county (ref='United States')
        Momcounty_cat (ref='Maricopa, Pima or Pinal Counties')
        mom_simp_stage (ref='Late-Latent')
        maxtiter_group (ref='a_Low Maximum Titer')
        Mother_HIV_Status_18 (ref='Negative')
        IndexSum (ref='4')
        no_livebirths_cat (ref='0-2 Live births')
        char_druguseall (ref='No')
        PriorSTI (ref='N')
        binationalmom (ref='No')
        Mother_education (ref='High school graduate or GED completed')
        PrimaryPayment (ref='AHCCSS')
        Mother_employment (ref='Employed full/part t')
        Mother_Housing (ref='Stably Housed (e.g. lives in consistent housing)')
        Mother_WIC (ref='No')
        IncarceratedWithinLast12Months (ref='N')
        / param=ref;

ods output ParameterEstimates=logistic_parms;

	model BabyDisease = 
        agegroup
        mom_race_eth_simp
        mother_county
        Momcounty_cat
        mom_simp_stage
        maxtiter_group
        Mother_HIV_Status_18
        IndexSum
        no_livebirths_cat
        char_druguseall
        PriorSTI
        binationalmom
        Mother_education
        PrimaryPayment
        Mother_employment
        Mother_Housing
        Mother_WIC
        IncarceratedWithinLast12Months;

run;

data parms_for_mianalyze;
    set logistic_parms;

    length Effect $200;

    if Variable = "Intercept" then Effect = "Intercept";
    else if ClassVal0 ne "" then Effect = catx(" = ", Variable, ClassVal0);
    else Effect = Variable;

    keep _Imputation_ Effect Estimate StdErr;
run;

proc sort data=parms_for_mianalyze;
    by Effect _Imputation_;
run;


/* Pool each regression coefficient across imputations */

proc mianalyze data=parms_for_mianalyze;
    by Effect;

    modeleffects Estimate;
    stderr StdErr;

    ods output ParameterEstimates=pooled_results;
run;


/* Convert pooled log odds to adjusted odds ratios */

data final_aor;
    set pooled_results;

    AOR = exp(Estimate);
    LowerCL_AOR = exp(LCLMean);
    UpperCL_AOR = exp(UCLMean);
run;

proc print data=final_aor noobs;
    var Effect Estimate StdErr AOR LowerCL_AOR UpperCL_AOR Probt;
run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
SAS_Rob
SAS Employee

It seems like the code is correct.  The last data step is necessary in order to get the combined odds ratios and confidence limits.  What is not needed is the PARMS_FOR_MIANALYZE step.  I suspect it will still give you the right answers, but you can take the data set as is coming from LOGISTIC and read it directly into MIANALYZE.

proc mianalyze parms(classvar=classval)=logistic_parms;
class agegroup 
        mom_race_eth_simp 
        mother_county 
        Momcounty_cat 
        mom_simp_stage
        maxtiter_group 
        Mother_HIV_Status_18 
        IndexSum 
        no_livebirths_cat 
        char_druguseall 
        PriorSTI 
        binationalmom 
        Mother_education 
        PrimaryPayment 
        Mother_employment 
        Mother_Housing 
        Mother_WIC 
        IncarceratedWithinLast12Months;
 modeleffects intercept  agegroup
        mom_race_eth_simp
        mother_county
        Momcounty_cat
        mom_simp_stage
        maxtiter_group
        Mother_HIV_Status_18
        IndexSum
        no_livebirths_cat
        char_druguseall
        PriorSTI
        binationalmom
        Mother_education
        PrimaryPayment
        Mother_employment
        Mother_Housing
        Mother_WIC
        IncarceratedWithinLast12Months;
ods output ParameterEstimates=pooled_results;
run;

 

View solution in original post

2 REPLIES 2
SAS_Rob
SAS Employee

It seems like the code is correct.  The last data step is necessary in order to get the combined odds ratios and confidence limits.  What is not needed is the PARMS_FOR_MIANALYZE step.  I suspect it will still give you the right answers, but you can take the data set as is coming from LOGISTIC and read it directly into MIANALYZE.

proc mianalyze parms(classvar=classval)=logistic_parms;
class agegroup 
        mom_race_eth_simp 
        mother_county 
        Momcounty_cat 
        mom_simp_stage
        maxtiter_group 
        Mother_HIV_Status_18 
        IndexSum 
        no_livebirths_cat 
        char_druguseall 
        PriorSTI 
        binationalmom 
        Mother_education 
        PrimaryPayment 
        Mother_employment 
        Mother_Housing 
        Mother_WIC 
        IncarceratedWithinLast12Months;
 modeleffects intercept  agegroup
        mom_race_eth_simp
        mother_county
        Momcounty_cat
        mom_simp_stage
        maxtiter_group
        Mother_HIV_Status_18
        IndexSum
        no_livebirths_cat
        char_druguseall
        PriorSTI
        binationalmom
        Mother_education
        PrimaryPayment
        Mother_employment
        Mother_Housing
        Mother_WIC
        IncarceratedWithinLast12Months;
ods output ParameterEstimates=pooled_results;
run;

 

Catch up on SAS Innovate 2026

Dive into keynotes, announcements and breakthroughs on demand.

Explore Now →
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 193 views
  • 1 like
  • 2 in conversation