I have a dataset with only a few continuous variables and a large number of ordinal categorical variables.
I have successfully run PROC MI with predictive mean matching for continuous variables and discriminant functions for ordinal categorical variables.
Is there a native way to get frequency and percent estimates for the imputed data for the categorical variables without turning them into binary dummies?
I have seen multiple questions for this online, but no answers.
Bonus question: You can use trace plots to look at convergence for continuous variables. But is there any output you can use to examine convergence related to the categorical variables (again, unless converted to dummies, I guess).
Thanks.
I think there is an inbetween step here where you analyze the imputations by imputation. What procedure are you using for this analysis? That may inform how MIANALYZE gets results. From there, you may have to do some post-processing to get percent estimates.
SteveDenham
Thanks so much for your reply, Steve.
Yes, I think I'm doing this in the standard way.
1. Run MI
2. Run some PROC by imputation
3. Combine imputations with MIANALYZE.
The question is: What PROC goes in Step 2, and what output of this PROC is fed to MIANALYZE? I have read that PROC FREQ does not work for this, though I know PROC MEANS does. What I am doing now is converting each ordinal to binary dummies, running PROC MEANS by imputation, then combining the means with MIANALYZE, giving me the proportion of each value and a standard error.... I think 🤔
I would love to know your assessment of this approach, and a more direct way--leaving the variables in their ordinal form--if it exists.
Also, any thoughts about judging the convergence when using the imputation methods for categorical data?
Thank you!
Using PROC MEANS is a good approach, if you had only two categorical variables then another would be PROC UNIVARIATE. You can then post-process the output to get percentages based on the counts. So then the question becomes "Could you use the CLASS statement in PROC MEANS and avoid the need to code up a lot of binary variables?" I think you should try, as it ought to reduce the amount of post-processing.
Now as far as judging convergence, the best I can come up with is to look at the relative efficiencies. If it is less than 0.99, you should probably look at a different method, but that doesn't say anything about the convergence. There might be a way to use the OUTITER=<dsn> to look at various values through multiple iterations. The documentation says that the dataset type is COV, but I am not sure what that implies in this case. If it is a square matrix, you could look at stabilization of the eigenvalues from iteration to iteration. What would probably be better is to use SGPLOT to generate something where you can look graphically for trends.
SteveDenham
Thanks so much, Steve! It's good to think I chose reasonable workarounds.
@SteveDenham wrote:
So then the question becomes "Could you use the CLASS statement in PROC MEANS and avoid the need to code up a lot of binary variables?" I think you should try, as it ought to reduce the amount of post-processing.
Oops. I just wrote and am executing 5,000 lines of code. 😄
I appreciate your ideas about judging the MI results. I'll experiment with them. Thanks again.
It shouldn't be necessary to make the conversion and use Proc MEANS. You could just use Proc SURVEYFREQ instead which gives standard errors for both the percentages and the frequencies. You could do something similar to the example below.
/* Getting Started Example
Generate Data */
proc format;
value ResponseCode 1 = 'Very Unsatisfied'
2 = 'Unsatisfied'
3 = 'Neutral'
4 = 'Satisfied'
5 = 'Very Satisfied';
run;
proc format;
value UserCode 1 = 'New Customer'
0 = 'Renewal Customer';
run;
proc format;
value SchoolCode 1 = 'Middle School'
2 = 'High School';
run;
proc format;
value DeptCode 0 = 'Faculty'
1 = 'Admin/Guidance';
run;
data SIS_Survey;
format Response ResponseCode.;
format NewUser UserCode.;
format SchoolType SchoolCode.;
format Department DeptCode.;
do _imputation_=1 to 2;
drop j;
retain seed1 111;
retain seed2 222;
retain seed3 333;
State = 'GA';
NewUser = 1;
do School=1 to 71;
call rantbl( seed1, .45, .55, SchoolType );
Department = 0;
call rannor( seed3, x );
SamplingWeight = 25 + x * 2;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .16, .21, .30, .24, .09, Response);
else
call rantbl( seed2, .18, .23, .30, .22, .07, Response);
output; end;
output;
Department = 1;
call rannor( seed3, x );
SamplingWeight = 15 + x * 1.5;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .10, .15, .33, .28, .14, Response );
else
call rantbl( seed2, .13, .20, .30, .26, .11, Response);
output; end;
end;
NewUser = 0;
do School=72 to 134;
call rantbl( seed1, .45, .55, SchoolType );
Department = 0;
call rannor( seed3, x );
SamplingWeight = 25 + x * 2;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .16, .21, .30, .24, .09, Response);
else
call rantbl( seed2, .18, .23, .30, .22, .07, Response);
output; end;
output;
Department = 1;
call rannor( seed3, x );
SamplingWeight = 15 + x * 1.5;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .10, .15, .33, .28, .14, Response );
else
call rantbl( seed2, .13, .20, .30, .26, .11, Response);
output; end;
end;
State = 'NC';
NewUser = 1;
do School = 135 to 218;
call rantbl( seed1, .45, .55, SchoolType );
Department = 0;
call rannor( seed3, x );
SamplingWeight = 25 + x * 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .16, .21, .30, .24, .09, Response);
else
call rantbl( seed2, .18, .23, .30, .22, .07, Response);
output; output;
output;
Department = 1;
call rannor( seed3, x );
SamplingWeight = 15 + x * 1.5;
if ( SchoolType = 1 ) then
call rantbl( seed2, .10, .15, .33, .28, .14, Response );
else
call rantbl( seed2, .13, .20, .30, .26, .11, Response);
output; output;
end;
NewUser = 0;
do School = 219 to 274;
call rantbl( seed1, .45, .55, SchoolType );
Department = 0;
call rannor( seed3, x );
SamplingWeight = 25 + x * 2;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .16, .21, .30, .24, .09, Response);
else
call rantbl( seed2, .18, .23, .30, .22, .07, Response);
output; end;
output;
Department = 1;
call rannor( seed3, x );
SamplingWeight = 15 + x * 1.5;
if ( SchoolType = 1 ) then
call rantbl( seed2, .10, .15, .33, .28, .14, Response );
else
call rantbl( seed2, .13, .20, .30, .26, .11, Response);
output; output;
end;
State = 'SC';
NewUser = 1;
do School = 275 to 328;
call rantbl( seed1, .45, .55, SchoolType );
Department = 0;
call rannor( seed3, x );
SamplingWeight = 25 + x * 2;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .16, .21, .30, .24, .09, Response);
else
call rantbl( seed2, .18, .23, .30, .22, .07, Response);
output; end;
output;
Department = 1;
call rannor( seed3, x );
SamplingWeight = 15 + x * 1.5;
if ( SchoolType = 1 ) then
call rantbl( seed2, .10, .15, .33, .28, .14, Response );
else
call rantbl( seed2, .13, .20, .30, .26, .11, Response);
output; output;
end;
NewUser = 0;
do School = 329 to 370;
call rantbl( seed1, .45, .55, SchoolType );
Department = 0;
call rannor( seed3, x );
SamplingWeight = 25 + x * 2;
do j=1 to 2;
if ( SchoolType = 1 ) then
call rantbl( seed2, .16, .21, .30, .24, .09, Response);
else
call rantbl( seed2, .18, .23, .30, .22, .07, Response);
output; end;
output;
Department = 1;
call rannor( seed3, x );
SamplingWeight = 15 + x * 1.5;
if ( SchoolType = 1 ) then
call rantbl( seed2, .10, .15, .33, .28, .14, Response );
else
call rantbl( seed2, .13, .20, .30, .26, .11, Response);
output; output;
end;
end;
run;
title 'School Information System Survey';
proc sort data=SIS_Survey;
by _imputation_;
/*Run SURVEYFREQ by _IMPUTATION_ assuming the MI step is already done*/
proc surveyfreq data=SIS_Survey;
by _imputation_;
tables state*Response/wtfreq;
ods output crosstabs=ctab;
ods trace on;
run;
proc print;
run;
/*Sort the data by the TABLES variable which is called RESPONSE here*/
proc sort;
by state response _imputation_;
run;
/*Run MIANALYZE with STDERR option for percentages*/
proc mianalyze data=ctab;
by state response;*this would be the TABLES variable;
modeleffects percent;
stderr stdErr;
title 'Results of for Proportions';
run;
/*Run MIANALYZE with STDERR option for Frequencies*/
proc mianalyze data=ctab;
by state response;*this would be the TABLES variable;
modeleffects WgtFreq;
stderr StdDev;
title 'Results of for Frequencies';
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.