Quickest way to determine if there are any observations in a dataset that satisfy a where clause?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 12
Accepted Solution

Quickest way to determine if there are any observations in a dataset that satisfy a where clause?

Hi all,

I was wondering if anyone has any suggestions as to the quickest way to determine whether a dataset has any observations that satisfy a given WHERE clause.

I have a large number of large datasets. I would like to see which ones have at least one observation that has VAR1 = VAL1.  I don't need to know the exact number of observations that satisfy this condition so maybe it is possible to make the process faster by stopping the 'search' of any given data set as soon as an observation is found. Does anyone know if this is possible?

Thanks in advance for any advice.


Accepted Solutions
Solution
‎01-15-2013 08:48 AM
Respected Advisor
Posts: 3,777

Re: Quickest way to determine if there are any observations in a dataset that satisfy a where clause?

You can write a SET statement, apply the WHERE and check END before you actually ready any records.  You can also check more than one data set in one data step.


data _null_;                                
  
call symputX('HasOBS',not eof,'LOCAL');  
  
call symputX('HasOBS1',not eof1,'LOCAL');  

  
STOP; *Be sure to stop before you read anything;                                   
  
set sashelp.class end=eof;                          
  
where sex eq 'F';

  
set sashelp.shoes end=eof1;                          
  
where stores eq 0 ;
   run;                                     
%put _user_;

View solution in original post


All Replies
Solution
‎01-15-2013 08:48 AM
Respected Advisor
Posts: 3,777

Re: Quickest way to determine if there are any observations in a dataset that satisfy a where clause?

You can write a SET statement, apply the WHERE and check END before you actually ready any records.  You can also check more than one data set in one data step.


data _null_;                                
  
call symputX('HasOBS',not eof,'LOCAL');  
  
call symputX('HasOBS1',not eof1,'LOCAL');  

  
STOP; *Be sure to stop before you read anything;                                   
  
set sashelp.class end=eof;                          
  
where sex eq 'F';

  
set sashelp.shoes end=eof1;                          
  
where stores eq 0 ;
   run;                                     
%put _user_;

Regular Contributor
Posts: 151

Re: Quickest way to determine if there are any observations in a dataset that satisfy a where clause?

Interesting answer, I've not used STOP in that context before.  I assume it's similar to using 'if 0 then set...'

My question is, why does it only read 1 observation from the source data set.  I know it's to do with using STOP in conjunction with EOF, but I'm sure how it's being processed.

Respected Advisor
Posts: 3,777

Re: Quickest way to determine if there are any observations in a dataset that satisfy a where clause?

STOP means "STOP execution" and after it is executed nothing else happens.  I think you could restructure the step to use IF 0 for the sets but the STOP would still be need to prevent looping.  This works because while the SET statements are never executed SAS has already determined the initial states of the END variables.

While the NOTEs indicate that one obs was read from the data set where the WHERE statement has obs., no observations are actually read into the PDV.  You can confirm that with PUT _ALL_; before the stop statement.

Also, if using this to check more than one data step there could be a type conflict if two or more variables have the same name by different type.  You could use a drop or keep data set option to "fix" that as long as the variables are not going to be used in the WHERE statement.  This can be avoided with rename but that may be a bit fiddley.  I usually use this with just one data set.

NOTE: There were 1 observations read from the data set SASHELP.CLASS.

      WHERE sex='F';

NOTE: There were 0 observations read from the data set SASHELP.SHOES.

      WHERE stores=0;

Contributor
Posts: 63

Re: Quickest way to determine if there are any observations in a dataset that satisfy a where clause?

This is very clever and I have a use for it.  Thanks for posting it.  I'm still having some trouble though understanding why it works.

Using "not eof" as the macro variable value throws me.  EOF has values of 1 or 0  so NOT EOF has values of 0 and 1, respectively.

So if there is at least one record in the data set with the specified value then EOF will never be reached and so will have a value of zero; so NOT EOF will have a value of 1.  Is my reasoning correct?

Respected Advisor
Posts: 3,777

Re: Quickest way to determine if there are any observations in a dataset that satisfy a where clause?

Since my indicator is HASOBS, a positive assertion, I reverse the value of
END.  When END=1, in this program, there are NO observations, when HASOBS=0 there are no observations.

Occasional Contributor
Posts: 10

Re: Quickest way to determine if there are any observations in a dataset that satisfy a where clause?

Very useful - just used this code for a similar purpose as the OP. Thanks for sharing!

Occasional Contributor
Posts: 12

Re: Quickest way to determine if there are any observations in a dataset that satisfy a where clause?

That worked like a charm! Nice solution.

Thanks very much.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 361 views
  • 3 likes
  • 5 in conversation