DATA Step, Macro, Functions and more

Array processing to find missing data

Reply
Frequent Contributor
Posts: 78

Array processing to find missing data

I am cleaning a fairly large dataset and need to be able to count the number of missing data fields for each record.

I've been able to do this for most variables, but I have a couple of groups of related variables that I need to be able to define that nothing should be missing if the first variable is not missing. I can't seem to figure out a way to do this in one array. Is this possible?

array d{40} d1start--d1stop d2start--d2stop d3start--d3stop d4start--d4stop;

dflag=0;
do i=1 to 40;
if d1start ne . and d{i}=. then dflag=dflag+1;
end;
(I know this code isn't doing what I want, but I wanted to provide some code to explain what I'm trying to do)

d1start--d1stop contains 10 related variables and I want to count missing data in these variables only if d1start is not missing or d2start is not missing, etc. I actually just need one total missing count variable for all 40 variables, but I was hoping to do this with one array rather than doing a separate array for each of the 10 groups.

I actually have a lot more data than this, but this is just an example of how it is set up.

Thanks and hope this is enough information. Message was edited by: statadm
Regular Contributor
Posts: 241

Re: Array processing to find missing data

Why bother with arrays at all?
[pre]
/* test data */
data one;
retain d1start d12-d19 d1stop
d2start d22-d29 d2stop
d3start d32-d39 d3stop
d4start d42-d49 d4stop 1;
call missing(d1start, d12, d24, d25, d32, d43);
run;

/* count missings by "group" */
data two;
set one;
m1 = nmiss(of d1start--d1stop);
m2 = nmiss(of d2start--d2stop);
m3 = nmiss(of d3start--d3stop);
m4 = nmiss(of d4start--d4stop);
mall = sum(of m1-m4);
run;

/* check */
proc print data=two noobs;
var m:;
run;
/* on lst
m1 m2 m3 m4 mall
2 2 1 1 6
*/
[/pre]
Frequent Contributor
Posts: 78

Re: Array processing to find missing data

Posted in reply to chang_y_chung_hotmail_com
Thanks, that looks like something that will work. I have over 100 groups of 10, so I was trying to avoid writing separate statements for each group, but if this is the best way I can certainly do it.
Regular Contributor
Posts: 241

Re: Array processing to find missing data

If you have 100 groups of 10, then you can do below, too. Just assign 100 to the macro variable, g, instead of 4. This assumes that in your input data the d varaibles are correctly ordered, without gaps, like: d1start ... d1stop d2start ... d2stop ... d100start ... d100stop.

BTW, 1K vars per obs seems quite a strange way to organize information to me, but you may have your reasons... :-)
[pre]
/* count missings by "group" */
%let g = 4;
data two;
set one;
array d{1:&g,1:10} d1start--d&g.stop;
array m{1:&g} m1-m&g;
drop i;
do i = 1 to &g;
m{i} = nmiss(of d{i,1},d{i,2},d{i,3},
d{i,4}, d{i,5}, d{i,6}, d{i,7},
d{i,8}, d{i,9}, d{i,10});
end;
mall = sum(of m1-m&g);
run;
[/pre]
Frequent Contributor
Posts: 78

Re: Array processing to find missing data

Posted in reply to chang_y_chung_hotmail_com
Great, this looks even better.

The dataset actually has over 3000 variables. I didn't create the dataset it is output from a web based data collection tool. I already reduced it from 5000 to 3000.

Thanks for your help!
Frequent Contributor
Posts: 78

Re: Array processing to find missing data

Ok, the variables are correctly ordered, but I do have gaps between the groups of 10 so the array won't work this way.

I'm probably just better off using the previous code.

Ideally, this would be better in long format, but I have to send this data to be cleaned in wide format to be read across for each observation. The data is time based and literally reads across for 120 minutes.
Ask a Question
Discussion stats
  • 5 replies
  • 143 views
  • 0 likes
  • 2 in conversation