DATA Step, Macro, Functions and more

Like operator not behaving consistently by not always matching at beginning of value

Accepted Solution Solved
Reply
Contributor
Posts: 46
Accepted Solution

Like operator not behaving consistently by not always matching at beginning of value

Hi there,

   I am losing it over what I think is basic straight forward code. I am joining have_1 which is a list of store names with have_2 which is a concordance between store name variations and base store names. When I run the following code I get inconsistent results:

data have_1;
infile datalines dlm = ':' truncover;
input store $CHAR50.;
datalines;
SUPER C SHAWNIGAN
 IGA
IGA
FRESON BROS. IGA
LOBLAW SUPERSTORE
SUPERSTORE LOBLAW
foodlang iga
IGA MARCHE
ROOTS BAYSHORE CENTRE
;
run;

data have_2;
infile datalines dlm = ':' truncover;
input base_store $ store_variation $;
datalines;
SUPER C:SUPER C
ROOTS:ROOTS
IGA:IGA 
LOBLAWS:LOBLAW 
;
run;


proc sql;
create table want as
select a.*, b.base_store
from have_1 a left join have_2 b
on upcase(a.store) like ('%'||b.store_variation||'%');
quit;

Results:

store         base_store

SUPER C SHAWNIGAN     SUPER C

 IGA                                     IGA

IGA                                      IGA

FRESON BROS. IGA          IGA

foodlang IGA                       IGA

SUPERSTORE LOBLAW   LOBLAWS

LOBLAW SUPERSTORE   

IGA MARCHE

ROOTS BAYSHORE CENTRE

 

My question is, why does the code make a match consistently when the store_variation is on its own or at the end of the store value, but not when the store_variation is at the beginning. Also, why do you think it matched at the beginning for SUPER C SHAWNIGAN, but not LOBLAW SUPERSTORE, IGA MARCHE or ROOTS BAYSHORE CENTRE?

 

I'm puzzled and need to figure this out. Please help!


Accepted Solutions
Solution
3 weeks ago
Super User
Super User
Posts: 8,115

Re: Like operator not behaving consistently by not always matching at beginning of value

Posted in reply to sas-inquirer

@sas-inquirer wrote:
Hi Reeza,
I've tried STRIP with CONTAINS, FIND with TRIM, INDEX and FINDW but I found with each I did not get the desired result. With these, it takes care of not coding SUPER C SHAWNIGAN to IGA, but then neglects to code FRESON BROS. IGA to IGA. Thanks for your suggestion. I could use FIND/FINDW for the single word cases, but I would still need to hardcode if index(STORE," IGA " to map IGA to FRESON BROS. IGA.

If you want to add spaces around the search term and use the INDEX() function then make sure to also add spaces around the string you are searching so it will match at the ends of the string.

index(cat(' ',string,' '),cat(' ',strip(word),' '))

You don't need to add the spaces if you use INDEXW() instead. Another advantage of using INDEXW() is that it will also work for finding IGA in strings like 'BROS.IGA'.  You can use the optional third argument to identify what characters are used to indicate word boundaries.  The default will include punctuation like periods. 

 

indexw(string,strip(word))

 

It might even be easier to use FINDW() instead.  Then you can use its modifiers to have it do case insensitive search and strip the blanks for you.

findw(string,word,,'spit')

 

 

 

View solution in original post


All Replies
Super User
Posts: 23,754

Re: Like operator not behaving consistently by not always matching at beginning of value

Posted in reply to sas-inquirer

You have the wildcards at the beginning and end to allow more flexibility but nothing in the middle. If its more of a search, you can use the FIND/INDEX function instead.  Or you can try a COMPGED or some other function to do a fuzzy match. 

 

 

Super User
Super User
Posts: 8,115

Re: Like operator not behaving consistently by not always matching at beginning of value

Posted in reply to sas-inquirer

SAS variables are fixed length and padded with spaces.  You are adding a lot of spaces before the trailing % wildcard.

upcase(a.store) like ('%'||b.store_variation||'%');

Since you variable has the default length of $8 when store variation is 'IGA' then contain 'IGA     '.

Don't include the spaces in the value.

upcase(a.store) like cats('%',b.store_variation,'%');
Contributor
Posts: 46

Re: Like operator not behaving consistently by not always matching at beginning of value

Hi Tom,
I like your solution. It certainly took care of the matching at beginning issue. Now assuming I had put a space before and after IGA in have_2, how could I make sure that SUPER C SHAWNIGAN only has one result SUPER C and not both SUPER C and IGA?
Contributor
Posts: 46

Re: Like operator not behaving consistently by not always matching at beginning of value

Posted in reply to sas-inquirer
Unless anyone has a more automated way of doing it, I think I'm going to have to hardcode some values. I plan on removing IGA from the have_2 table and hardcoding the store_variation(s) into an if statement using the index function.
Super User
Posts: 23,754

Re: Like operator not behaving consistently by not always matching at beginning of value

Posted in reply to sas-inquirer

For single word matches can you try FIND() or FINDW() instead?

FINDW/INDEXW looks for a full word match, whereas FIND/INDEX will find embedded matches, such as in a word.

Contributor
Posts: 46

Re: Like operator not behaving consistently by not always matching at beginning of value

Hi Reeza,
I've tried STRIP with CONTAINS, FIND with TRIM, INDEX and FINDW but I found with each I did not get the desired result. With these, it takes care of not coding SUPER C SHAWNIGAN to IGA, but then neglects to code FRESON BROS. IGA to IGA. Thanks for your suggestion. I could use FIND/FINDW for the single word cases, but I would still need to hardcode if index(STORE," IGA " to map IGA to FRESON BROS. IGA.
Super User
Posts: 23,754

Re: Like operator not behaving consistently by not always matching at beginning of value

Posted in reply to sas-inquirer

To me, treating the single word conditions differently would still categorize the IGA one. 

Unfortunately with this type of search each approach has it's pro/cons so you have to pick the best that works, or use multiple approaches.

 

Open Refine is not a bad tool to do this work either.

Solution
3 weeks ago
Super User
Super User
Posts: 8,115

Re: Like operator not behaving consistently by not always matching at beginning of value

Posted in reply to sas-inquirer

@sas-inquirer wrote:
Hi Reeza,
I've tried STRIP with CONTAINS, FIND with TRIM, INDEX and FINDW but I found with each I did not get the desired result. With these, it takes care of not coding SUPER C SHAWNIGAN to IGA, but then neglects to code FRESON BROS. IGA to IGA. Thanks for your suggestion. I could use FIND/FINDW for the single word cases, but I would still need to hardcode if index(STORE," IGA " to map IGA to FRESON BROS. IGA.

If you want to add spaces around the search term and use the INDEX() function then make sure to also add spaces around the string you are searching so it will match at the ends of the string.

index(cat(' ',string,' '),cat(' ',strip(word),' '))

You don't need to add the spaces if you use INDEXW() instead. Another advantage of using INDEXW() is that it will also work for finding IGA in strings like 'BROS.IGA'.  You can use the optional third argument to identify what characters are used to indicate word boundaries.  The default will include punctuation like periods. 

 

indexw(string,strip(word))

 

It might even be easier to use FINDW() instead.  Then you can use its modifiers to have it do case insensitive search and strip the blanks for you.

findw(string,word,,'spit')

 

 

 

Contributor
Posts: 46

Re: Like operator not behaving consistently by not always matching at beginning of value

Hi Tom,
Fantastic solutions! I am going with indexw. It worked like a charm. Thank you so much!!
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 107 views
  • 5 likes
  • 3 in conversation