BookmarkSubscribeRSS Feed
Veera_48
Calcite | Level 5

Why we can’t use where statement when we are reading data from external files

8 REPLIES 8
ballardw
Super User

In a data step WHERE refers to values from an incoming data set. From the documentation:

Selects observations from SAS data sets that meet a particular condition.

Since you are reading an external file then there are no 'observations from SAS data set' involved.

 

IF will subset data for what is written to the output data set.

 

If var in (<list of values goes here>);

 

for example would only write records where the IF is true to the data set.

 

 

PeterClemmensen
Tourmaline | Level 20

Because the doc says so 🙂

 

"You cannot use the WHERE statement to select records from an external file that contains raw data, nor can you use the WHERE statement within the same DATA step in which you read in-stream data with a DATALINES statement."

 

 

Tom
Super User Tom
Super User

If you have permission to run operating system commands then read the output of a program that filters the source file instead of reading the file itself.

 

So this code on Unix will only pass lines that contain the string YES to the data step to be read.

data want;
  infile 'grep YES myfile.txt' pipe ;
  input ... ;
run;
FreelanceReinh
Jade | Level 19

There's an exception to the general rule: When reading certain types of external files you can use a WHERE statement, provided that an appropriate engine enables "dataset-like" access. For example, you can read (suitable) Excel files or MS Access tables in a DATA step with SET together with a WHERE statement via SAS/ACCESS for PC Files.

 

Similarly, with only Base SAS you can read XML files, which are plain text files with a specific structure. 

 

Example:

filename myref 'C:\Temp\have.xml';

libname mylib xml xmlfileref=myref;

/* Create an XML file */

data mylib.have;
set sashelp.class(obs=3);
run;

/* Read selected records from the XML file */

data want;
set mylib.have;
where name like 'A%';
run;

 

Reeza
Super User
Order of operations logic. The file is currently not a SAS data set so SAS doesn't know what's in it. To find out what's in it, it needs to process the file and read each line. When it reads the line of data and only NOW does it know that you don't want that line. So you can use an IF to drop it or conditionally not write it to the output.

But WHERE filters data as it comes in to the system, because SAS already knows what exists in that data (or can easily find it out). The same is not true of a file that it doesn't 'know'.
PGStats
Opal | Level 21

If you want to use WHERE syntax while reading a text file or inline data, use the output dataset option:

 

data test(where=(name contains "y"));
input name $;
datalines;
John
Mary
Henry
Lucy
;

PGStats_0-1611864493486.png

 

PG
ChrisNZ
Tourmaline | Level 20

You can use WHERE statements when reading from a source referenced by a library name.

So XML, excel, JSON, files can use a where clause. Similarly, Oracle or Teradata libraries also can.

Though these are not SAS date sets, the engines make them appear as data sets to SAS programs.

 

If you do not use a libname, you are reading a file which doesn't have defined variables yet.  

Since WHERE clauses are processed before the data is loaded by the step, and WHERE clauses use variable names, they can't be used.

IF clauses can be used after the variable names have been defined in the step. 

.

 

 

mkeintz
PROC Star

WHERE statement logic is outsourced to the data engine being used to read the data (and the data engine is specified in the LIBNAME statement).  If you can't specify a LIBNAME for the data, then there is no engine to perform the outsourced filtering.

 

But you CAN use a subsetting IF statement, because it is not outsourced.

 

data want;
   input  x y code :$1.;
   if code='A';
datalines;
11 12 X
21 22 A
31 32 A
41 42 B
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2134 views
  • 7 likes
  • 9 in conversation