Why we can’t use where statement when we are reading data from external files
In a data step WHERE refers to values from an incoming data set. From the documentation:
Selects observations from SAS data sets that meet a particular condition.
Since you are reading an external file then there are no 'observations from SAS data set' involved.
IF will subset data for what is written to the output data set.
If var in (<list of values goes here>);
for example would only write records where the IF is true to the data set.
Because the doc says so 🙂
"You cannot use the WHERE statement to select records from an external file that contains raw data, nor can you use the WHERE statement within the same DATA step in which you read in-stream data with a DATALINES statement."
If you have permission to run operating system commands then read the output of a program that filters the source file instead of reading the file itself.
So this code on Unix will only pass lines that contain the string YES to the data step to be read.
data want;
infile 'grep YES myfile.txt' pipe ;
input ... ;
run;
There's an exception to the general rule: When reading certain types of external files you can use a WHERE statement, provided that an appropriate engine enables "dataset-like" access. For example, you can read (suitable) Excel files or MS Access tables in a DATA step with SET together with a WHERE statement via SAS/ACCESS for PC Files.
Similarly, with only Base SAS you can read XML files, which are plain text files with a specific structure.
Example:
filename myref 'C:\Temp\have.xml';
libname mylib xml xmlfileref=myref;
/* Create an XML file */
data mylib.have;
set sashelp.class(obs=3);
run;
/* Read selected records from the XML file */
data want;
set mylib.have;
where name like 'A%';
run;
If you want to use WHERE syntax while reading a text file or inline data, use the output dataset option:
data test(where=(name contains "y"));
input name $;
datalines;
John
Mary
Henry
Lucy
;
You can use WHERE statements when reading from a source referenced by a library name.
So XML, excel, JSON, files can use a where clause. Similarly, Oracle or Teradata libraries also can.
Though these are not SAS date sets, the engines make them appear as data sets to SAS programs.
If you do not use a libname, you are reading a file which doesn't have defined variables yet.
Since WHERE clauses are processed before the data is loaded by the step, and WHERE clauses use variable names, they can't be used.
IF clauses can be used after the variable names have been defined in the step.
.
WHERE statement logic is outsourced to the data engine being used to read the data (and the data engine is specified in the LIBNAME statement). If you can't specify a LIBNAME for the data, then there is no engine to perform the outsourced filtering.
But you CAN use a subsetting IF statement, because it is not outsourced.
data want;
input x y code :$1.;
if code='A';
datalines;
11 12 X
21 22 A
31 32 A
41 42 B
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.