BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
_maldini_
Barite | Level 11

What's the most efficient way to exclude observations w/ missing values for multiple variables in a dataset?

 

I want to do something like this in my DATA step, but I get errors: 

		WHERE NOT MISSING (SEQN);
		WHERE SAME AND NOT MISSING (DUQ:);

There are 11 variables w/ a prefix of DUQ. I only want to include the observations that do NOT have missing values.

 

Thanks. 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Yes, NM is a new variable that counts the number of missing values for all those variables combined:  all the DUQ: variables, as well as SEQN.

 

Your later use of the WHERE statement is correct.  To get only observations with no missing values in that key set of variables:

 

where NM=0;

View solution in original post

6 REPLIES 6
_maldini_
Barite | Level 11

Thanks but I think the nmiss function "returns the number of missing values within a list of variables defined." Am I mistaken?

 

I don't want to count them, I want to exclude them. I want the final dataset to include only those observations with complete data...

Astounding
PROC Star

It must be documented somewhere, but I'm finding in limited testing that variable lists cannot be used in a WHERE statement.

 

You can easily run this in a DATA step:

 

nm = nmiss(of duq: , seqn);

 

But you can't use:

 

where nmiss(of duq: , seqn) = 0;

 

For what it's worth, you could add the variable NM to your data set (using the formula above), and later use:

 

where nm = 0;

 

Or you could spell out all the names:

 

where nmiss(duq_01, duquess, duq_abc, seqn) = 0;

_maldini_
Barite | Level 11

@Astounding 

 

I like this approach, b/c it will allow me to run PROCs on datasets w/ complete information as well as on datasets w/o complete information. But I don't totally understand it.

 

<nm = nmiss(of duq: , seqn);>

 

Is this supposed to create a new variable - nm - that has a value of 0 when the value of duq: or seqn is NOT missing? All the values of nm are 0 when I run this...

 

Data want;

Set have;

where nm = 0 (i.e. Not missing);

Run;

Astounding
PROC Star

Yes, NM is a new variable that counts the number of missing values for all those variables combined:  all the DUQ: variables, as well as SEQN.

 

Your later use of the WHERE statement is correct.  To get only observations with no missing values in that key set of variables:

 

where NM=0;

Tom
Super User Tom
Super User

You cannot use variable lists in WHERE statements. But you can use them in IF statement or a subsetting IF statement.

 

data want ;
  set have ;
  if nmiss(of SEQN DUQ:) then delete;
 ...

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 25325 views
  • 4 likes
  • 4 in conversation