I know that where can be used in both DATA and PROC step, whereas if can only be used in DATA step. However, what is the difference between where and if?
The where statement can be used in procedures to subset data. The if statement cannot be used in procedures.
The where statement is more efficient than the if statement. It tells SAS not to read all observations from the data set.
@VDD The where statement is more efficient than the if statement. It tells SAS not to read all observations from the data set.
Not always. When functions are used, IF is faster than WHERE.
This has to do with functions being optimised to access PDV data, but not so fast when reading from the read buffer.
If someone has more details I'd be curious to know more actually.
So if A=1; is slower than where A=1;
But if round(A)=1; is faster than where round(A)=1;
data TEST;
do I=1 to 1e8;
output;
end;
run;
sasfile TEST load;
data _null_; set TEST; if I =0; run; * 4 seconds;
data _null_; set TEST; where I =0; run; * 2 seconds;
data _null_; set TEST; if round(I)=0; run; * 5 seconds;
data _null_; set TEST; where round(I)=0; run; * 11 seconds;
sasfile TEST close;
http://www2.sas.com/proceedings/sugi31/238-31.pdf
And many, many more:
https://www.lexjansen.com/search/searchresults.php?q=where%20vs%20if
@clq wrote:
I know that where can be used in both DATA and PROC step, whereas if can only be used in DATA step. However, what is the difference between where and if?
Also, if you have multiple datasets coming in (merge), a where statement may not be possible:
data sexes;
set sashelp.class;
keep name sex;
run;
data others;
set sashelp.class (drop=sex);
run;
data want;
merge
others
sexes
;
by name;
where age = 16;
run;
but the if will work:
data want;
merge
others
sexes
;
by name;
if age = 16;
run;
If you need to build a combined condition using variables from multiple datasets that are not present in all, then the subsetting if statement is the only option.
WHERE statements or data set option can only reference variables that appear in the incoming data set(s), If can use newly created variables.
Example: The first data step generates an error, the second doesn't.
data junk; set sashelp.class; ns= catt(name,sex); where ns in ('AlfredM' 'CarolF'); run; data junk2; set sashelp.class; ns= catt(name,sex); if ns in ('AlfredM' 'CarolF'); run;
When SAS reads the row of a data into PDV, two variables will be created i.e. the FIRST.variable and LAST.variable.
These variables are not in the input table . Therefore, you can't use the WHERE statement to subset rows of variables which are not yet
in the input data. Instead, you can use the subsetting IF statement.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.