Hi Everyone,
I am trying to save my missing data patterns as a variable (i.e. if there are 20 patterns, saving the pattern number as an additional variable).
Current my code is:
ods select misspatterns;
proc mi data=workdata nimpute=0;
var x y z;
run;
Is there something I can add to save the pattern # as a variable?
Thanks in advance!
Jillian halladje@mcmaster.ca
You could detect the patterns and add the 'group' variable, but I think a better way if to just encode the missing value patterns directly by using a string variable that contains a binary sequence. The i_th character of the string will be '1' if the i_th variable has a missing value; otherwise, the i_th character is '0'. Then PROC FREQ gives the same summary of the missing value patterns as PROC MI, except the group values are meaningful.
%let vars = AgeAtStart Height Weight Diastolic
Systolic MRW Smoking Cholesterol;
data CodeMissing; /* or make a view with / view=CodeMissing; */
set Sashelp.Heart;
array V[*] &vars;
MissInd = subpad('0', 1, dim(V));
do i = 1 to dim(V);
substr(MissInd, i, 1) = put(cmiss(V[i]), 1.); /* indicate missing */
/* or use ^cmiss(V[i]) iF you want to indicate nonmissing values */
end;
run;
proc freq data=CodeMissing;
tables MissInd;
run;
What do you mean by "saving the pattern number as an additional variable"? Do you want to create a new variable from the existing pattern matrix?
You can use the ODS OUTPUT statement to save any SAS table:
ods output MissPattern=Pattern;
proc mi data=Sashelp.Heart nimpute=0 displaypattern=nomeans;
var AgeAtStart Height Weight Diastolic
Systolic MRW Smoking Cholesterol;
run;
proc print data=Pattern;
run;
If you want to create a new variable from those columns, specify what you are looking for.
If your goal is to ultimately assign each observation to its respective missing data pattern then the easiest way I can think of to do this is to use Proc IML in conjunction with the missing data pattern ODS table. Below is what I have in mind.
data Fitness1;
input Oxygen RunTime RunPulse @@;
datalines;
44.609 11.37 178 45.313 10.07 185
54.297 8.65 156 59.571 . .
49.874 9.22 . 44.811 11.63 176
. 11.95 176 . 10.85 .
39.442 13.08 174 60.055 8.63 170
50.541 . . 37.388 14.03 186
44.754 11.12 176 47.273 . .
51.855 10.33 166 49.156 8.95 180
40.836 10.95 168 46.672 10.00 .
46.774 10.25 . 50.388 10.08 168
39.407 12.63 174 46.080 11.17 156
45.441 9.63 164 . 8.92 .
45.118 11.08 . 39.203 12.88 168
45.790 10.47 186 50.545 9.93 148
48.673 9.40 186 47.920 11.50 170
47.467 10.50 170
;
proc mi data=Fitness1 seed=1518971 simple nimpute=0;
var Oxygen RunTime RunPulse;
ods output MissPattern=mdpattern;
run;
proc iml;
/*Read in original data set*/
use fitness1;
read all var{Oxygen RunTime RunPulse} into dat[colname=cname];
/*Generate the observation level missing data pattern*/
misspattern=missing(dat);
/*Read in the Summarized form from Proc MI*/
use mdpattern;
read all var {Oxygen_miss RunTime_miss RunPulse_miss} into sumpattern;
/*Read in the group number*/
read all var{group} into grp;
/*Reformat the MI data set so that it has 0s and 1s instead of Xs and .s to make the comparison easier*/
sumpattern=(sumpattern='.');
/*Compare each of the rows to the patterns*/
x=j(nrow(dat),1,.);
do j=1 to nrow(dat);
do i=1 to nrow(sumpattern);
if misspattern[j,]=sumpattern[i,] then x[j]=grp[i];
end;
end;
/*Save the pattern as a variable in the original data set*/
datx=dat||x;
create fitness2 from datx [colname= (cname||'patnum')];
append from datx;
quit;
proc print data=fitness2;
run;
Add
ods output MissPattern=MyMissPattern;
to your code which will save the "Missing Data Patterns" table to a SAS dataset named MyMissPattern. The group variable in this dataset is what you want.
Edit: Oops, Rick beat me to the answer 🙂
You could detect the patterns and add the 'group' variable, but I think a better way if to just encode the missing value patterns directly by using a string variable that contains a binary sequence. The i_th character of the string will be '1' if the i_th variable has a missing value; otherwise, the i_th character is '0'. Then PROC FREQ gives the same summary of the missing value patterns as PROC MI, except the group values are meaningful.
%let vars = AgeAtStart Height Weight Diastolic
Systolic MRW Smoking Cholesterol;
data CodeMissing; /* or make a view with / view=CodeMissing; */
set Sashelp.Heart;
array V[*] &vars;
MissInd = subpad('0', 1, dim(V));
do i = 1 to dim(V);
substr(MissInd, i, 1) = put(cmiss(V[i]), 1.); /* indicate missing */
/* or use ^cmiss(V[i]) iF you want to indicate nonmissing values */
end;
run;
proc freq data=CodeMissing;
tables MissInd;
run;
It turns out that I wrote a blog on the topic of creating a string that encapsulates the missing value pattern back in... See the section "Shorten the labels for the missing data patterns."
Hi Rick,
This ended up working for me today! I am not sure what I have been doing wrong for the past bit, but I am happy it worked. I also reviewed your blog post which was also very helpful.
I have a lot of variables in my dataset - so even working on a smaller number of the variables than I plan to in the final analysis, I have 650 patterns. Most of these patterns only have 1 or 2 individuals with the particular pattern. Is there a way to graphically visualize the missing data patterns (from your blog post) but only for missing data patterns that have a minimum % of observations?
Also, given the string variables are very long due to the number of variables in the missing analysis, is there a way to save the "group" of missingness alongside the string of missing? Or is there a way to create a new variable based on the string to synthesize the patterns into "groups"?
My goal is to create propensity scores for the most common missing data patterns and then use this propensity to be missing to create a dataset with missing variables for simulations (where missingness is MAR, not MCAR and resembles the full dataset).
If it is helpful to connect via email my email is halladje@mcmaster.ca Thanks so much for all your help.
I'm pretty sure the answer is "yes" to all your questions, although I have not done it and have no code to contribute.
I suggest you attempt to program a solution to these questions on a small data set such as Sashelp.Heart that we all can access. If you run into problems, open a new thread and show us your initial attempts.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.