DATA Step, Macro, Functions and more

Exclude observations w/ missing values

Accepted Solution Solved
Reply
Regular Contributor
Posts: 199
Accepted Solution

Exclude observations w/ missing values

What's the most efficient way to exclude observations w/ missing values for multiple variables in a dataset?

 

I want to do something like this in my DATA step, but I get errors: 

		WHERE NOT MISSING (SEQN);
		WHERE SAME AND NOT MISSING (DUQ:);

There are 11 variables w/ a prefix of DUQ. I only want to include the observations that do NOT have missing values.

 

Thanks. 


Accepted Solutions
Solution
‎07-19-2017 06:51 PM
Super User
Posts: 5,495

Re: Exclude observations w/ missing values

Yes, NM is a new variable that counts the number of missing values for all those variables combined:  all the DUQ: variables, as well as SEQN.

 

Your later use of the WHERE statement is correct.  To get only observations with no missing values in that key set of variables:

 

where NM=0;

View solution in original post


All Replies
Regular Contributor
Posts: 228

Re: Exclude observations w/ missing values

try nmiss for multiple variables with missing values
MISSING! - Understanding and Making the Most of Missing Data

Regular Contributor
Posts: 199

Re: Exclude observations w/ missing values

Thanks but I think the nmiss function "returns the number of missing values within a list of variables defined." Am I mistaken?

 

I don't want to count them, I want to exclude them. I want the final dataset to include only those observations with complete data...

Super User
Posts: 5,495

Re: Exclude observations w/ missing values

It must be documented somewhere, but I'm finding in limited testing that variable lists cannot be used in a WHERE statement.

 

You can easily run this in a DATA step:

 

nm = nmiss(of duq: , seqn);

 

But you can't use:

 

where nmiss(of duq: , seqn) = 0;

 

For what it's worth, you could add the variable NM to your data set (using the formula above), and later use:

 

where nm = 0;

 

Or you could spell out all the names:

 

where nmiss(duq_01, duquess, duq_abc, seqn) = 0;

Regular Contributor
Posts: 199

Re: Exclude observations w/ missing values

Posted in reply to Astounding

@Astounding 

 

I like this approach, b/c it will allow me to run PROCs on datasets w/ complete information as well as on datasets w/o complete information. But I don't totally understand it.

 

<nm = nmiss(of duq: , seqn);>

 

Is this supposed to create a new variable - nm - that has a value of 0 when the value of duq: or seqn is NOT missing? All the values of nm are 0 when I run this...

 

Data want;

Set have;

where nm = 0 (i.e. Not missing);

Run;

Solution
‎07-19-2017 06:51 PM
Super User
Posts: 5,495

Re: Exclude observations w/ missing values

Yes, NM is a new variable that counts the number of missing values for all those variables combined:  all the DUQ: variables, as well as SEQN.

 

Your later use of the WHERE statement is correct.  To get only observations with no missing values in that key set of variables:

 

where NM=0;

Super User
Super User
Posts: 7,035

Re: Exclude observations w/ missing values

You cannot use variable lists in WHERE statements. But you can use them in IF statement or a subsetting IF statement.

 

data want ;
  set have ;
  if nmiss(of SEQN DUQ:) then delete;
 ...

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 135 views
  • 4 likes
  • 4 in conversation