Solved: Saving missing data patterns

halladje · Posted 06-17-2019 11:29 AM

Hi Everyone,

I am trying to save my missing data patterns as a variable (i.e. if there are 20 patterns, saving the pattern number as an additional variable).

Current my code is:

ods select misspatterns;

proc mi data=workdata nimpute=0;

var x y z;

run;

Is there something I can add to save the pattern # as a variable?

Thanks in advance!

Jillian halladje@mcmaster.ca

Rick_SAS · Posted 06-17-2019 02:22 PM

You could detect the patterns and add the 'group' variable, but I think a better way if to just encode the missing value patterns directly by using a string variable that contains a binary sequence. The i_th character of the string will be '1' if the i_th variable has a missing value; otherwise, the i_th character is '0'. Then PROC FREQ gives the same summary of the missing value patterns as PROC MI, except the group values are meaningful.

%let vars = AgeAtStart Height Weight Diastolic 
           Systolic MRW Smoking Cholesterol;

data CodeMissing;  /* or make a view with  / view=CodeMissing; */
set Sashelp.Heart;
array V[*] &vars;
MissInd = subpad('0', 1, dim(V));
do i = 1 to dim(V);
   substr(MissInd, i, 1) = put(cmiss(V[i]), 1.); /* indicate missing */
   /* or use ^cmiss(V[i]) iF you want to indicate nonmissing values */
end;
run;

proc freq data=CodeMissing;
   tables MissInd;
run;

View solution in original post

Rick_SAS · Posted 06-17-2019 01:27 PM

What do you mean by "saving the pattern number as an additional variable"? Do you want to create a new variable from the existing pattern matrix?

You can use the ODS OUTPUT statement to save any SAS table:

ods output MissPattern=Pattern;
proc mi data=Sashelp.Heart nimpute=0 displaypattern=nomeans;
var AgeAtStart Height Weight Diastolic
Systolic MRW Smoking Cholesterol;
run;

proc print data=Pattern;
run;

If you want to create a new variable from those columns, specify what you are looking for.

halladje · Posted 06-17-2019 02:01 PM

Hi Rick,

Thanks so much for providing this code! This worked. I am hoping to add the "group" variable now in "pattern" to my main dataset. Any ideas? Thanks

SAS_Rob · Posted 06-17-2019 02:05 PM

If your goal is to ultimately assign each observation to its respective missing data pattern then the easiest way I can think of to do this is to use Proc IML in conjunction with the missing data pattern ODS table. Below is what I have in mind.

data Fitness1;

input Oxygen RunTime RunPulse @@;

datalines;

44.609 11.37 178 45.313 10.07 185

54.297 8.65 156 59.571 . .

49.874 9.22 . 44.811 11.63 176

. 11.95 176 . 10.85 .

39.442 13.08 174 60.055 8.63 170

50.541 . . 37.388 14.03 186

44.754 11.12 176 47.273 . .

51.855 10.33 166 49.156 8.95 180

40.836 10.95 168 46.672 10.00 .

46.774 10.25 . 50.388 10.08 168

39.407 12.63 174 46.080 11.17 156

45.441 9.63 164 . 8.92 .

45.118 11.08 . 39.203 12.88 168

45.790 10.47 186 50.545 9.93 148

48.673 9.40 186 47.920 11.50 170

47.467 10.50 170

;

proc mi data=Fitness1 seed=1518971 simple nimpute=0;

var Oxygen RunTime RunPulse;

ods output MissPattern=mdpattern;

run;

proc iml;

/*Read in original data set*/

use fitness1;

read all var{Oxygen RunTime RunPulse} into dat[colname=cname];

/*Generate the observation level missing data pattern*/

misspattern=missing(dat);

/*Read in the Summarized form from Proc MI*/

use mdpattern;

read all var {Oxygen_miss RunTime_miss RunPulse_miss} into sumpattern;

/*Read in the group number*/

read all var{group} into grp;

/*Reformat the MI data set so that it has 0s and 1s instead of Xs and .s to make the comparison easier*/

sumpattern=(sumpattern='.');

/*Compare each of the rows to the patterns*/

x=j(nrow(dat),1,.);

do j=1 to nrow(dat);

do i=1 to nrow(sumpattern);

if misspattern[j,]=sumpattern[i,] then x[j]=grp[i];

end;

/*Save the pattern as a variable in the original data set*/

datx=dat||x;

create fitness2 from datx [colname= (cname||'patnum')];

append from datx;

quit;

proc print data=fitness2;

run;

sld · Posted 06-17-2019 01:32 PM

Add

ods output MissPattern=MyMissPattern;

to your code which will save the "Missing Data Patterns" table to a SAS dataset named MyMissPattern. The group variable in this dataset is what you want.

Edit: Oops, Rick beat me to the answer 🙂

Rick_SAS · Posted 06-17-2019 02:22 PM

You could detect the patterns and add the 'group' variable, but I think a better way if to just encode the missing value patterns directly by using a string variable that contains a binary sequence. The i_th character of the string will be '1' if the i_th variable has a missing value; otherwise, the i_th character is '0'. Then PROC FREQ gives the same summary of the missing value patterns as PROC MI, except the group values are meaningful.

%let vars = AgeAtStart Height Weight Diastolic 
           Systolic MRW Smoking Cholesterol;

data CodeMissing;  /* or make a view with  / view=CodeMissing; */
set Sashelp.Heart;
array V[*] &vars;
MissInd = subpad('0', 1, dim(V));
do i = 1 to dim(V);
   substr(MissInd, i, 1) = put(cmiss(V[i]), 1.); /* indicate missing */
   /* or use ^cmiss(V[i]) iF you want to indicate nonmissing values */
end;
run;

proc freq data=CodeMissing;
   tables MissInd;
run;

Rick_SAS · Posted 06-19-2019 05:58 AM

It turns out that I wrote a blog on the topic of creating a string that encapsulates the missing value pattern back in... See the section "Shorten the labels for the missing data patterns."

halladje · Posted 07-05-2019 08:58 AM

Hi Rick,

This ended up working for me today! I am not sure what I have been doing wrong for the past bit, but I am happy it worked. I also reviewed your blog post which was also very helpful.

I have a lot of variables in my dataset - so even working on a smaller number of the variables than I plan to in the final analysis, I have 650 patterns. Most of these patterns only have 1 or 2 individuals with the particular pattern. Is there a way to graphically visualize the missing data patterns (from your blog post) but only for missing data patterns that have a minimum % of observations?

Also, given the string variables are very long due to the number of variables in the missing analysis, is there a way to save the "group" of missingness alongside the string of missing? Or is there a way to create a new variable based on the string to synthesize the patterns into "groups"?

My goal is to create propensity scores for the most common missing data patterns and then use this propensity to be missing to create a dataset with missing variables for simulations (where missingness is MAR, not MCAR and resembles the full dataset).

If it is helpful to connect via email my email is halladje@mcmaster.ca Thanks so much for all your help.

Rick_SAS · Posted 07-05-2019 09:13 AM

I'm pretty sure the answer is "yes" to all your questions, although I have not done it and have no code to contribute.

I suggest you attempt to program a solution to these questions on a small data set such as Sashelp.Heart that we all can access. If you run into problems, open a new thread and show us your initial attempts.

Saving missing data patterns

Re: Saving missing data patterns

Re: Saving missing data patterns

Re: Saving missing data patterns

Re: Saving missing data patterns

Re: Saving missing data patterns

Re: Saving missing data patterns

Re: Saving missing data patterns

Re: Saving missing data patterns

Re: Saving missing data patterns

Catch up on SAS Innovate 2026