BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LisavH
Calcite | Level 5

Hi,

 

I have imputed data using multiple imputation using PROC MI in SAS, generating n imputed datasets. Now, I would like to report a baseline table with imputed values. However, I cannot find the right SAS code to do so. I've used PROC FREQ using a BY _ imputation _ statement, to get n baseline tables. I want to combine these n tables into just one, with pooled frequencies for every category of the covariates.

Apparently this is fairly easy in SPSS but not in SAS. I found some topics suggesting PROC MIANALYZE, which might be a solution, but the examples I found only considered estimates in for example regression models. Does someone have a suggestion on how to combine the frequencies from n imputed datasets into one pooled baseline table?

 

Many thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
SAS_Rob
SAS Employee

In order to combine point estimates for any statistic in a multiple imputation setting, you must have a standard error associated with that point estimate.  The issue with using Proc FREQ is that it does not give you standard errors for either the percentages or the frequencies.  My suggestion would be to use Proc SURVEYFREQ instead and combine the results similar to the approach in the attached example.  Note that you must use the WTFREQ option on the TABLES statement even if you do not have a WEIGHT statement in order to get a standard error.

 

/* Generate Data */

proc format;
value ResponseCode 1 = 'Very Unsatisfied'
2 = 'Unsatisfied'
3 = 'Neutral'
4 = 'Satisfied'
5 = 'Very Satisfied';
run;

proc format;
value UserCode 1 = 'New Customer'
0 = 'Renewal Customer';
run;

proc format;
value SchoolCode 1 = 'Middle School'
2 = 'High School';
run;

proc format;
value DeptCode 0 = 'Faculty'
1 = 'Admin/Guidance';
run;

data SIS_Survey;
format Response ResponseCode.;
format NewUser UserCode.;
format SchoolType SchoolCode.;
format Department DeptCode.;
do _imputation_=1 to 2;
drop j;
retain seed1 111;
retain seed2 222;
retain seed3 333;

State = 'GA';

NewUser = 1;
do School=1 to 71;

call rantbl( seed1, .45, .55, SchoolType );

Department = 0;
call rannor( seed3, x );
SamplingWeight = 25 + x * 2;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .16, .21, .30, .24, .09, Response);
else
call rantbl( seed2, .18, .23, .30, .22, .07, Response);
output; end;
output;

Department = 1;
call rannor( seed3, x );
SamplingWeight = 15 + x * 1.5;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .10, .15, .33, .28, .14, Response );
else
call rantbl( seed2, .13, .20, .30, .26, .11, Response);
output; end;
end;

NewUser = 0;
do School=72 to 134;

call rantbl( seed1, .45, .55, SchoolType );

Department = 0;
call rannor( seed3, x );
SamplingWeight = 25 + x * 2;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .16, .21, .30, .24, .09, Response);
else
call rantbl( seed2, .18, .23, .30, .22, .07, Response);
output; end;
output;

Department = 1;
call rannor( seed3, x );
SamplingWeight = 15 + x * 1.5;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .10, .15, .33, .28, .14, Response );
else
call rantbl( seed2, .13, .20, .30, .26, .11, Response);
output; end;
end;

State = 'NC';

NewUser = 1;
do School = 135 to 218;

call rantbl( seed1, .45, .55, SchoolType );

Department = 0;
call rannor( seed3, x );
SamplingWeight = 25 + x * 2;

if ( SchoolType = 1 ) then
call rantbl( seed2, .16, .21, .30, .24, .09, Response);
else
call rantbl( seed2, .18, .23, .30, .22, .07, Response);
output; output;
output;

Department = 1;
call rannor( seed3, x );
SamplingWeight = 15 + x * 1.5;

if ( SchoolType = 1 ) then
call rantbl( seed2, .10, .15, .33, .28, .14, Response );
else
call rantbl( seed2, .13, .20, .30, .26, .11, Response);
output; output;
end;

NewUser = 0;
do School = 219 to 274;

call rantbl( seed1, .45, .55, SchoolType );

Department = 0;
call rannor( seed3, x );
SamplingWeight = 25 + x * 2;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .16, .21, .30, .24, .09, Response);
else
call rantbl( seed2, .18, .23, .30, .22, .07, Response);
output; end;
output;

Department = 1;
call rannor( seed3, x );
SamplingWeight = 15 + x * 1.5;

if ( SchoolType = 1 ) then
call rantbl( seed2, .10, .15, .33, .28, .14, Response );
else
call rantbl( seed2, .13, .20, .30, .26, .11, Response);
output; output;
end;

State = 'SC';

NewUser = 1;
do School = 275 to 328;

call rantbl( seed1, .45, .55, SchoolType );

Department = 0;
call rannor( seed3, x );
SamplingWeight = 25 + x * 2;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .16, .21, .30, .24, .09, Response);
else
call rantbl( seed2, .18, .23, .30, .22, .07, Response);
output; end;
output;

Department = 1;
call rannor( seed3, x );
SamplingWeight = 15 + x * 1.5;

if ( SchoolType = 1 ) then
call rantbl( seed2, .10, .15, .33, .28, .14, Response );
else
call rantbl( seed2, .13, .20, .30, .26, .11, Response);
output; output;
end;

NewUser = 0;
do School = 329 to 370;

call rantbl( seed1, .45, .55, SchoolType );

Department = 0;
call rannor( seed3, x );
SamplingWeight = 25 + x * 2;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .16, .21, .30, .24, .09, Response);
else
call rantbl( seed2, .18, .23, .30, .22, .07, Response);
output; end;
output;

Department = 1;
call rannor( seed3, x );
SamplingWeight = 15 + x * 1.5;

if ( SchoolType = 1 ) then
call rantbl( seed2, .10, .15, .33, .28, .14, Response );
else
call rantbl( seed2, .13, .20, .30, .26, .11, Response);
output; output;
end;
end;
run;
/*Run SURVEYFREQ by _IMPUTATION_ assuming the MI step is already done*/
proc surveyfreq data=SIS_Survey;
by _imputation_;
tables Response*schooltype/wtfreq;
ods output CrossTabs=mi_ctab;
run;


/*Sort the data by the TABLES variables which is called RESPONSE here*/
proc sort data=mi_ctab;
by response schooltype _imputation_;
run;

/*Run MIANALYZE with STDERR option*/
proc mianalyze data=mi_ctab;
by response schooltype;*this would be the TABLES variable;
modeleffects wgtfreq;
stderr stdDev;
title 'Results for Weighted Frequency';
run;

 

 

 

View solution in original post

2 REPLIES 2
SAS_Rob
SAS Employee

In order to combine point estimates for any statistic in a multiple imputation setting, you must have a standard error associated with that point estimate.  The issue with using Proc FREQ is that it does not give you standard errors for either the percentages or the frequencies.  My suggestion would be to use Proc SURVEYFREQ instead and combine the results similar to the approach in the attached example.  Note that you must use the WTFREQ option on the TABLES statement even if you do not have a WEIGHT statement in order to get a standard error.

 

/* Generate Data */

proc format;
value ResponseCode 1 = 'Very Unsatisfied'
2 = 'Unsatisfied'
3 = 'Neutral'
4 = 'Satisfied'
5 = 'Very Satisfied';
run;

proc format;
value UserCode 1 = 'New Customer'
0 = 'Renewal Customer';
run;

proc format;
value SchoolCode 1 = 'Middle School'
2 = 'High School';
run;

proc format;
value DeptCode 0 = 'Faculty'
1 = 'Admin/Guidance';
run;

data SIS_Survey;
format Response ResponseCode.;
format NewUser UserCode.;
format SchoolType SchoolCode.;
format Department DeptCode.;
do _imputation_=1 to 2;
drop j;
retain seed1 111;
retain seed2 222;
retain seed3 333;

State = 'GA';

NewUser = 1;
do School=1 to 71;

call rantbl( seed1, .45, .55, SchoolType );

Department = 0;
call rannor( seed3, x );
SamplingWeight = 25 + x * 2;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .16, .21, .30, .24, .09, Response);
else
call rantbl( seed2, .18, .23, .30, .22, .07, Response);
output; end;
output;

Department = 1;
call rannor( seed3, x );
SamplingWeight = 15 + x * 1.5;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .10, .15, .33, .28, .14, Response );
else
call rantbl( seed2, .13, .20, .30, .26, .11, Response);
output; end;
end;

NewUser = 0;
do School=72 to 134;

call rantbl( seed1, .45, .55, SchoolType );

Department = 0;
call rannor( seed3, x );
SamplingWeight = 25 + x * 2;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .16, .21, .30, .24, .09, Response);
else
call rantbl( seed2, .18, .23, .30, .22, .07, Response);
output; end;
output;

Department = 1;
call rannor( seed3, x );
SamplingWeight = 15 + x * 1.5;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .10, .15, .33, .28, .14, Response );
else
call rantbl( seed2, .13, .20, .30, .26, .11, Response);
output; end;
end;

State = 'NC';

NewUser = 1;
do School = 135 to 218;

call rantbl( seed1, .45, .55, SchoolType );

Department = 0;
call rannor( seed3, x );
SamplingWeight = 25 + x * 2;

if ( SchoolType = 1 ) then
call rantbl( seed2, .16, .21, .30, .24, .09, Response);
else
call rantbl( seed2, .18, .23, .30, .22, .07, Response);
output; output;
output;

Department = 1;
call rannor( seed3, x );
SamplingWeight = 15 + x * 1.5;

if ( SchoolType = 1 ) then
call rantbl( seed2, .10, .15, .33, .28, .14, Response );
else
call rantbl( seed2, .13, .20, .30, .26, .11, Response);
output; output;
end;

NewUser = 0;
do School = 219 to 274;

call rantbl( seed1, .45, .55, SchoolType );

Department = 0;
call rannor( seed3, x );
SamplingWeight = 25 + x * 2;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .16, .21, .30, .24, .09, Response);
else
call rantbl( seed2, .18, .23, .30, .22, .07, Response);
output; end;
output;

Department = 1;
call rannor( seed3, x );
SamplingWeight = 15 + x * 1.5;

if ( SchoolType = 1 ) then
call rantbl( seed2, .10, .15, .33, .28, .14, Response );
else
call rantbl( seed2, .13, .20, .30, .26, .11, Response);
output; output;
end;

State = 'SC';

NewUser = 1;
do School = 275 to 328;

call rantbl( seed1, .45, .55, SchoolType );

Department = 0;
call rannor( seed3, x );
SamplingWeight = 25 + x * 2;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .16, .21, .30, .24, .09, Response);
else
call rantbl( seed2, .18, .23, .30, .22, .07, Response);
output; end;
output;

Department = 1;
call rannor( seed3, x );
SamplingWeight = 15 + x * 1.5;

if ( SchoolType = 1 ) then
call rantbl( seed2, .10, .15, .33, .28, .14, Response );
else
call rantbl( seed2, .13, .20, .30, .26, .11, Response);
output; output;
end;

NewUser = 0;
do School = 329 to 370;

call rantbl( seed1, .45, .55, SchoolType );

Department = 0;
call rannor( seed3, x );
SamplingWeight = 25 + x * 2;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .16, .21, .30, .24, .09, Response);
else
call rantbl( seed2, .18, .23, .30, .22, .07, Response);
output; end;
output;

Department = 1;
call rannor( seed3, x );
SamplingWeight = 15 + x * 1.5;

if ( SchoolType = 1 ) then
call rantbl( seed2, .10, .15, .33, .28, .14, Response );
else
call rantbl( seed2, .13, .20, .30, .26, .11, Response);
output; output;
end;
end;
run;
/*Run SURVEYFREQ by _IMPUTATION_ assuming the MI step is already done*/
proc surveyfreq data=SIS_Survey;
by _imputation_;
tables Response*schooltype/wtfreq;
ods output CrossTabs=mi_ctab;
run;


/*Sort the data by the TABLES variables which is called RESPONSE here*/
proc sort data=mi_ctab;
by response schooltype _imputation_;
run;

/*Run MIANALYZE with STDERR option*/
proc mianalyze data=mi_ctab;
by response schooltype;*this would be the TABLES variable;
modeleffects wgtfreq;
stderr stdDev;
title 'Results for Weighted Frequency';
run;

 

 

 

LisavH
Calcite | Level 5

Hi SAS_Rob,

 

Thank you for your reply.  I had found this SAS-code earlier on in my search but somehow the code did not work then. But I tried the code again and now it works! So thank you very much! 

 

I realized I was close to another solution as well: the weighted frequencies of PROC MIANALYZE are the same as when performing PROC UNIVARIATE and dividing the frequency counts by the n imputations. Some example coding with treatment being binary (no, yes) and gender also being binary (male, female):

 

*To obtain frequencies per category of the outcome variable;
proc univariate data=mi_data freq;
class treatment; *To obtain the cumulative frequency omit this line;
var gender;
run;

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1744 views
  • 1 like
  • 2 in conversation