BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Angel_Saenz
Quartz | Level 8

hello, how can I use LIKE command in a MERGE?

For example I have this code:

data C;

MERGE A B;

BY ID;

WHERE VAR1 LIKE '%X%';

RUN;

 

but var1 is only in A, so sas send me this error - ERROR: Variable var1 is not on file WORK.B.

 

thanks

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

If you are trying to subset the result AFTER merging then use

IF VAR1 LIKE '%X%'

 

If you only want to select records from data set A that meet the condtion then use the dataset option to filter before merging:

data C;
   MERGE 
      A  ( WHERE=( VAR1 LIKE '%X%'))
      B
   ;
   BY ID;
  ;
RUN;

The error is because the WHERE statement applies to all the datasets in the SET or MERGE statements. The dataset options only applies to the data set followed by the (). Note that the dataset option doesn't really allow use of functions though, only comparison operators.

 

View solution in original post

7 REPLIES 7
ballardw
Super User

If you are trying to subset the result AFTER merging then use

IF VAR1 LIKE '%X%'

 

If you only want to select records from data set A that meet the condtion then use the dataset option to filter before merging:

data C;
   MERGE 
      A  ( WHERE=( VAR1 LIKE '%X%'))
      B
   ;
   BY ID;
  ;
RUN;

The error is because the WHERE statement applies to all the datasets in the SET or MERGE statements. The dataset options only applies to the data set followed by the (). Note that the dataset option doesn't really allow use of functions though, only comparison operators.

 

Angel_Saenz
Quartz | Level 8

THANKS! solution2 works, your solution 1 send me this errors:

 

15         data C;
16         MERGE A
17           B;
18         BY ID;
19         IF VAR LIKE '%X%';
                        ___
                        388
                        202
ERROR 388-185: Expecting an arithmetic operator.
ERROR 202-322: The option or parameter is not recognized and will be ignored.

 

I think because LIKE dosn't work with IF

Tom
Super User Tom
Super User

VAR LIKE '%X' is the same as INDEX(VAR,'X') or INDEXC(VAR,'X').

ballardw
Super User

LIKE is only allowed in a WHERE statement.

You'll need to use other character functions to find the records you want if subsetting after the merge.

If you are looking with anything with an X then

 

IF index(var1,'X')> 0;

would work. Index searches for any match of the string in quotes in the variable.

ChrisNZ
Tourmaline | Level 20

Note that from a performance viewpoint, the different solutions are not equivalent.

 

option (where=( VAR1 like '%X%' )) is  faster than

option (where=( index(VAR1,'X') ))   which is in turn faster than

option (where=( find(VAR1,'X') ))  

  

In theory, the IF test is even slower as it evaluates the data once it has been read and loaded rather than at the source when reading the table. This is a common misconception.

In real life, this is not always the case. I suspect that the WHERE clause slows down the read operation whereas the IF clause allows for bulk loading and then discards the superfluous records.

This should not be the case, and there is something not quite right in the way WHERE clauses are executed. That's a place SAS can look at if they want to optimise their data step compiler. Another place is that WHERE clauses don't utilise the same resources when used as an option or as a statement.

 

See below how IF can be twice the speed of WHERE.

 

More performance tips in https://www.amazon.com/High-Performance-SAS-Coding-Christian-Graffeuille/dp/1512397490

 

If you want to benchmark on your system, here are examples I just ran (SAS9.4 on WIN 64) :

 

%* CPU times and real times are equal;

options nofullstimer;

%* Create sample and load it in memory;

data A(compress=no); A='ssssssssssssssssss'; drop I; do I=1 to 1e8; output; end; run;

sasfile A load;

%* Select none;

data _null_; set A ( where=( A like '%X%' )) ; run; %*  5.8 seconds;

data _null_; set A ( where=( index(A,'X') )) ; run; %*  7.9 seconds;

data _null_; set A ( where=( find(A,'X') ))  ; run; %*  8.9 seconds;

data _null_; set A ; if find(A,'X')                 ; run; %*  4.4 seconds;

%* Select all;

data _null_; set A ( where=( A like '%s%' )) ; run; %* 10.6 seconds;

data _null_; set A ( where=( index(A,'s') )) ; run; %* 13.8 seconds;

data _null_; set A ( where=( find(A,'s') ))  ; run; %* 14.0 seconds;

data _null_; set A ; if find(A,'s')                 ; run; %*  3.4 seconds;

sasfile A close;

 

 Also note how the IF statement becomes faster for the "select all" test since the string match happens right away, whereas the WHERE clauses become slower even though the string match also happens sooner. Not good!

It looks like SAS wastes time slow-reading, then assessing, then slow-loading the data.

Angel_Saenz
Quartz | Level 8
Thanks for your answer
Angel_Saenz
Quartz | Level 8

Thanks   I accepted your first answer as solution.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2781 views
  • 5 likes
  • 4 in conversation