hello, how can I use LIKE command in a MERGE?
For example I have this code:
data C;
MERGE A B;
BY ID;
WHERE VAR1 LIKE '%X%';
RUN;
but var1 is only in A, so sas send me this error - ERROR: Variable var1 is not on file WORK.B.
thanks
If you are trying to subset the result AFTER merging then use
IF VAR1 LIKE '%X%'
If you only want to select records from data set A that meet the condtion then use the dataset option to filter before merging:
data C;
MERGE
A ( WHERE=( VAR1 LIKE '%X%'))
B
;
BY ID;
;
RUN;
The error is because the WHERE statement applies to all the datasets in the SET or MERGE statements. The dataset options only applies to the data set followed by the (). Note that the dataset option doesn't really allow use of functions though, only comparison operators.
If you are trying to subset the result AFTER merging then use
IF VAR1 LIKE '%X%'
If you only want to select records from data set A that meet the condtion then use the dataset option to filter before merging:
data C;
MERGE
A ( WHERE=( VAR1 LIKE '%X%'))
B
;
BY ID;
;
RUN;
The error is because the WHERE statement applies to all the datasets in the SET or MERGE statements. The dataset options only applies to the data set followed by the (). Note that the dataset option doesn't really allow use of functions though, only comparison operators.
THANKS! solution2 works, your solution 1 send me this errors:
15 data C;
16 MERGE A
17 B;
18 BY ID;
19 IF VAR LIKE '%X%';
___
388
202
ERROR 388-185: Expecting an arithmetic operator.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
I think because LIKE dosn't work with IF
VAR LIKE '%X' is the same as INDEX(VAR,'X') or INDEXC(VAR,'X').
LIKE is only allowed in a WHERE statement.
You'll need to use other character functions to find the records you want if subsetting after the merge.
If you are looking with anything with an X then
IF index(var1,'X')> 0;
would work. Index searches for any match of the string in quotes in the variable.
Note that from a performance viewpoint, the different solutions are not equivalent.
option (where=( VAR1 like '%X%' )) is faster than
option (where=( index(VAR1,'X') )) which is in turn faster than
option (where=( find(VAR1,'X') ))
In theory, the IF test is even slower as it evaluates the data once it has been read and loaded rather than at the source when reading the table. This is a common misconception.
In real life, this is not always the case. I suspect that the WHERE clause slows down the read operation whereas the IF clause allows for bulk loading and then discards the superfluous records.
This should not be the case, and there is something not quite right in the way WHERE clauses are executed. That's a place SAS can look at if they want to optimise their data step compiler. Another place is that WHERE clauses don't utilise the same resources when used as an option or as a statement.
See below how IF can be twice the speed of WHERE.
More performance tips in https://www.amazon.com/High-Performance-SAS-Coding-Christian-Graffeuille/dp/1512397490
If you want to benchmark on your system, here are examples I just ran (SAS9.4 on WIN 64) :
%* CPU times and real times are equal;
options nofullstimer;
%* Create sample and load it in memory;
data A(compress=no); A='ssssssssssssssssss'; drop I; do I=1 to 1e8; output; end; run;
sasfile A load;
%* Select none;
data _null_; set A ( where=( A like '%X%' )) ; run; %* 5.8 seconds;
data _null_; set A ( where=( index(A,'X') )) ; run; %* 7.9 seconds;
data _null_; set A ( where=( find(A,'X') )) ; run; %* 8.9 seconds;
data _null_; set A ; if find(A,'X') ; run; %* 4.4 seconds;
%* Select all;
data _null_; set A ( where=( A like '%s%' )) ; run; %* 10.6 seconds;
data _null_; set A ( where=( index(A,'s') )) ; run; %* 13.8 seconds;
data _null_; set A ( where=( find(A,'s') )) ; run; %* 14.0 seconds;
data _null_; set A ; if find(A,'s') ; run; %* 3.4 seconds;
sasfile A close;
Also note how the IF statement becomes faster for the "select all" test since the string match happens right away, whereas the WHERE clauses become slower even though the string match also happens sooner. Not good!
It looks like SAS wastes time slow-reading, then assessing, then slow-loading the data.
Thanks ballardw I accepted your first answer as solution.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.