I am trying to count the number of missing values on a set of vars. Then, if there are two or more vars with missing values, I want to set all the vars to have missing values. The following works:
data want (drop = i j k); set have;
ct_pre = 0;
ct_post = 0;
array vars (7) var1-var7;
do i = 1 to 7;
if vars{i} = . then ct_pre +1;
end;
do j = 1 to 7;
if ct_pre >= 2 then vars{j} = .;
end;
do k = 1 to 7;
if vars{k} = . then ct_post +1;
end;
run;
Thus I end up with ct_post having only values 0, 1, or 7, as desired.
However, I can't wrap my head around why this other, seemingly simpler approach doesn't work. Some variables do get set to missing in some cases, but other times they do not get set to missing. Thus, as in ct_pre, ct_post ends up with values 0-7, although a handful of the cases change in value. There does not appear to be a method to the madness, as in I don't see any strange patterns with specific vars. What does it have to do--I assume--with operating within only a single do/end space? I feel like I'm not understanding something fundamental, so if anyone can explain, I would appreciate it! Thanks!
data want (drop = i); set have;
ct_pre = 0;
ct_post = 0;
array vars (7) var1-var7;
do i = 1 to 7;
if vars{i} = . then ct_pre +1;
if ct_pre >= 2 then vars{i} = .;
if vars{i} = . then ct_post +1;
end;
run;
Because the second IF statement
if ct_pre >= 2 then vars{j} = .;
Is testing the value of CT_PRE at a different point in its development.
In the first step you wait until you have counted ALL of the elements in the first array. In the second one you are testing the value of CT_PRE before it has finished being calculated.
So get the count before the DO loop.
data want (drop = i);
set have;
array vars {7} var1-var7;
ct_pre = nmiss( of vars{*});
if ct_pre >= 2 then do i = 1 to 7;
vars{i} = .;
end;
ct_post = nmiss( of vars{*});
run;
Because the second IF statement
if ct_pre >= 2 then vars{j} = .;
Is testing the value of CT_PRE at a different point in its development.
In the first step you wait until you have counted ALL of the elements in the first array. In the second one you are testing the value of CT_PRE before it has finished being calculated.
So get the count before the DO loop.
data want (drop = i);
set have;
array vars {7} var1-var7;
ct_pre = nmiss( of vars{*});
if ct_pre >= 2 then do i = 1 to 7;
vars{i} = .;
end;
ct_post = nmiss( of vars{*});
run;
@awesome_opossum wrote:
I am trying to count the number of missing values on a set of vars. Then, if there are two or more vars with missing values, I want to set all the vars to have missing values.
How about this:
data want;
set have;
n_missing=nmiss(of var1-var7);
if n_missing>=2 then call missing(of var1-var7);
run;
This can be simplified as follows:
data want (drop = i j k);
set have;
ct_pre = 0;
ct_post = 0;
array vars (7) var1-var7;
*number of missing values in the array;
ct_pre = nmiss(of vars(*));
*set all to missing if 2 or more are missing;
if ct_pre >= 2 then call missing(of vars(*));
*number of missing values after setting values to missing;
ct_post = nmiss(of vars(*));
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.