BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
halladje
Fluorite | Level 6

Hi Everyone, 

 

I am trying to save my missing data patterns as a variable (i.e. if there are 20 patterns, saving the pattern number as an additional variable). 

 

Current my code is:

 

ods select misspatterns;

proc mi data=workdata nimpute=0;

var x y z;

run;

 

Is there something I can add to save the pattern # as a variable? 

 

Thanks in advance! 

Jillian halladje@mcmaster.ca 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

You could detect the patterns and add the 'group' variable, but I think a better way if to just encode the missing value patterns directly by using a string variable that contains a binary sequence. The i_th character of the string will be '1' if the i_th variable has a missing value; otherwise, the i_th character is '0'. Then PROC FREQ gives the same summary of the missing value patterns as PROC MI, except the group values are meaningful.

 

%let vars = AgeAtStart Height Weight Diastolic 
           Systolic MRW Smoking Cholesterol;

data CodeMissing;  /* or make a view with  / view=CodeMissing; */
set Sashelp.Heart;
array V[*] &vars;
MissInd = subpad('0', 1, dim(V));
do i = 1 to dim(V);
   substr(MissInd, i, 1) = put(cmiss(V[i]), 1.); /* indicate missing */
   /* or use ^cmiss(V[i]) iF you want to indicate nonmissing values */
end;
run;

proc freq data=CodeMissing;
   tables MissInd;
run;

View solution in original post

8 REPLIES 8
Rick_SAS
SAS Super FREQ

What do you mean by "saving the pattern number as an additional variable"? Do you want to create a new variable from the existing pattern matrix?

 

You can use the ODS OUTPUT statement to save any SAS table:

 

ods output MissPattern=Pattern;
proc mi data=Sashelp.Heart nimpute=0 displaypattern=nomeans;
var AgeAtStart Height Weight Diastolic
Systolic MRW Smoking Cholesterol;
run;

 

proc print data=Pattern;
run;

 

If you want to create a new variable from those columns, specify what you are looking for.

halladje
Fluorite | Level 6
Hi Rick,

Thanks so much for providing this code! This worked. I am hoping to add the "group" variable now in "pattern" to my main dataset. Any ideas? Thanks
SAS_Rob
SAS Employee

If your goal is to ultimately assign each observation to its respective missing data pattern then the easiest way I can think of to do this is to use Proc IML in conjunction with the missing data pattern ODS table.  Below is what I have in mind.

 

data Fitness1;

   input Oxygen RunTime RunPulse @@;

   datalines;

44.609  11.37  178     45.313  10.07  185

54.297   8.65  156     59.571    .      .

49.874   9.22    .     44.811  11.63  176

  .     11.95  176          .  10.85    .

39.442  13.08  174     60.055   8.63  170

50.541    .      .     37.388  14.03  186

44.754  11.12  176     47.273    .      .

51.855  10.33  166     49.156   8.95  180

40.836  10.95  168     46.672  10.00    .

46.774  10.25    .     50.388  10.08  168

39.407  12.63  174     46.080  11.17  156

45.441   9.63  164       .      8.92    .

45.118  11.08    .     39.203  12.88  168

45.790  10.47  186     50.545   9.93  148

48.673   9.40  186     47.920  11.50  170

47.467  10.50  170

;

proc mi data=Fitness1 seed=1518971 simple nimpute=0;

   var Oxygen RunTime RunPulse;

   ods output MissPattern=mdpattern;

run;

proc iml;

/*Read in original data set*/

use fitness1;

read all var{Oxygen RunTime RunPulse} into dat[colname=cname];

 

/*Generate the observation level missing data pattern*/

misspattern=missing(dat);

 

/*Read in the Summarized form from Proc MI*/

use mdpattern;

read all var {Oxygen_miss RunTime_miss RunPulse_miss} into sumpattern;

 

/*Read in the group number*/

read all var{group} into grp;

 

 

/*Reformat the MI data set so that it has 0s and 1s instead of Xs and .s to make the comparison easier*/

sumpattern=(sumpattern='.');

 

 

/*Compare each of the rows to the patterns*/

x=j(nrow(dat),1,.);

do j=1 to nrow(dat);

do i=1 to nrow(sumpattern);

if misspattern[j,]=sumpattern[i,] then x[j]=grp[i];

end;

end;

 

/*Save the pattern as a variable in the original data set*/

datx=dat||x;

create fitness2 from datx [colname= (cname||'patnum')];

append from datx;

quit;

 

proc print data=fitness2;

run;

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Add

 

ods output MissPattern=MyMissPattern;

to your code which will save the "Missing Data Patterns" table to a SAS dataset named MyMissPattern. The group variable in this dataset is what you want.

 

Edit: Oops, Rick beat me to the answer 🙂

Rick_SAS
SAS Super FREQ

You could detect the patterns and add the 'group' variable, but I think a better way if to just encode the missing value patterns directly by using a string variable that contains a binary sequence. The i_th character of the string will be '1' if the i_th variable has a missing value; otherwise, the i_th character is '0'. Then PROC FREQ gives the same summary of the missing value patterns as PROC MI, except the group values are meaningful.

 

%let vars = AgeAtStart Height Weight Diastolic 
           Systolic MRW Smoking Cholesterol;

data CodeMissing;  /* or make a view with  / view=CodeMissing; */
set Sashelp.Heart;
array V[*] &vars;
MissInd = subpad('0', 1, dim(V));
do i = 1 to dim(V);
   substr(MissInd, i, 1) = put(cmiss(V[i]), 1.); /* indicate missing */
   /* or use ^cmiss(V[i]) iF you want to indicate nonmissing values */
end;
run;

proc freq data=CodeMissing;
   tables MissInd;
run;
Rick_SAS
SAS Super FREQ

It turns out that I wrote a blog on the topic of creating a string that encapsulates the missing value pattern back in... See the section "Shorten the labels for the missing data patterns."

halladje
Fluorite | Level 6

Hi Rick, 

 

This ended up working for me today! I am not sure what I have been doing wrong for the past bit, but I am happy it worked. I also reviewed your blog post which was also very helpful. 

 

I have a lot of variables in my dataset - so even working on a smaller number of the variables than I plan to in the final analysis, I have 650 patterns. Most of these patterns only have 1 or 2 individuals with the particular pattern. Is there a way to graphically visualize the missing data patterns (from your blog post) but only for missing data patterns that have a minimum % of observations? 

 

Also, given the string variables are very long due to the number of variables in the missing analysis, is there a way to save the "group" of missingness alongside the string of missing? Or is there a way to create a new variable based on the string to synthesize the patterns into "groups"?

 

My goal is to create propensity scores for the most common missing data patterns and then use this propensity to be missing to create a dataset with missing variables for simulations (where missingness is MAR, not MCAR and resembles the full dataset). 

 

If it is helpful to connect via email my email is halladje@mcmaster.ca Thanks so much for all your help. 

 

Rick_SAS
SAS Super FREQ

I'm pretty sure the answer is "yes" to all your questions, although I have not done it and have no code to contribute.

 

I suggest you attempt to program a solution to these questions on a small data set such as Sashelp.Heart that we all can access. If you run into problems, open a new thread and show us your initial attempts.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1284 views
  • 0 likes
  • 4 in conversation