What's the most efficient way to exclude observations w/ missing values for multiple variables in a dataset?
I want to do something like this in my DATA step, but I get errors:
WHERE NOT MISSING (SEQN); WHERE SAME AND NOT MISSING (DUQ:);
There are 11 variables w/ a prefix of DUQ. I only want to include the observations that do NOT have missing values.
Thanks.
Yes, NM is a new variable that counts the number of missing values for all those variables combined: all the DUQ: variables, as well as SEQN.
Your later use of the WHERE statement is correct. To get only observations with no missing values in that key set of variables:
where NM=0;
try nmiss for multiple variables with missing values
MISSING! - Understanding and Making the Most of Missing Data
Thanks but I think the nmiss function "returns the number of missing values within a list of variables defined." Am I mistaken?
I don't want to count them, I want to exclude them. I want the final dataset to include only those observations with complete data...
It must be documented somewhere, but I'm finding in limited testing that variable lists cannot be used in a WHERE statement.
You can easily run this in a DATA step:
nm = nmiss(of duq: , seqn);
But you can't use:
where nmiss(of duq: , seqn) = 0;
For what it's worth, you could add the variable NM to your data set (using the formula above), and later use:
where nm = 0;
Or you could spell out all the names:
where nmiss(duq_01, duquess, duq_abc, seqn) = 0;
I like this approach, b/c it will allow me to run PROCs on datasets w/ complete information as well as on datasets w/o complete information. But I don't totally understand it.
<nm = nmiss(of duq: , seqn);>
Is this supposed to create a new variable - nm - that has a value of 0 when the value of duq: or seqn is NOT missing? All the values of nm are 0 when I run this...
Data want;
Set have;
where nm = 0 (i.e. Not missing);
Run;
Yes, NM is a new variable that counts the number of missing values for all those variables combined: all the DUQ: variables, as well as SEQN.
Your later use of the WHERE statement is correct. To get only observations with no missing values in that key set of variables:
where NM=0;
You cannot use variable lists in WHERE statements. But you can use them in IF statement or a subsetting IF statement.
data want ;
set have ;
if nmiss(of SEQN DUQ:) then delete;
...
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.