DATA Step, Macro, Functions and more

Efficient way to pattern match strings in SAS

Accepted Solution Solved
Reply
Contributor
Posts: 60
Accepted Solution

Efficient way to pattern match strings in SAS

Hi,

 

I was wondering if there is any way to do efficient data filtering with pattern matching functions in SAS. What I mean by this is that I would give the function a variable (VAR) as an input parameter and the function would parse the data to see if any of the VAR values are contained in the destination variable string.

To clarify this with an example:

 

I hava a dataset (MATCH) which I maintain and it includes the desired patterns.

For example:

VAR

hi

hello

bye

 

And then in the other dataset (HAVE) I have a variable which is used as a filter criteria lets say FVAR

FVAR

hi all!

hello!

good night

good morning

bye bye

 

and my WANT dataset would be

FVAR

hi all!

hello!

bye bye

 

And just to point out my MATCH dataset has hundreds of rows so doing this my macro variables or hard coding is not an option.

 


Accepted Solutions
Solution
‎05-06-2016 05:33 AM
Super User
Super User
Posts: 7,970

Re: Efficient way to pattern match strings in SAS

There are many string comparison functions in SAS, you will find them listed here;

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245860.htm

 

You could also use Perl Regular expressions.

 

The simplest way to do what you are asking is like this:

data match;
  length var $200;
  input var $;
datalines;
hi
hello
bye
;
run;
data have;
  length fvar $200;
  input FVAR $;
datalines;
hi all!
hello!
good night
good morning
bye bye
;
run;

proc sql;
  create table WANT as
  select  A.*
  from    HAVE A
  left join MATCH B
  on      index(A.FVAR,B.VAR) > 0
  where   B.VAR is not null;
quit;

Do note, posting test data in the form of a datastep (as shown above) makes it a lot easier to get tested code back to you.

View solution in original post


All Replies
Solution
‎05-06-2016 05:33 AM
Super User
Super User
Posts: 7,970

Re: Efficient way to pattern match strings in SAS

There are many string comparison functions in SAS, you will find them listed here;

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245860.htm

 

You could also use Perl Regular expressions.

 

The simplest way to do what you are asking is like this:

data match;
  length var $200;
  input var $;
datalines;
hi
hello
bye
;
run;
data have;
  length fvar $200;
  input FVAR $;
datalines;
hi all!
hello!
good night
good morning
bye bye
;
run;

proc sql;
  create table WANT as
  select  A.*
  from    HAVE A
  left join MATCH B
  on      index(A.FVAR,B.VAR) > 0
  where   B.VAR is not null;
quit;

Do note, posting test data in the form of a datastep (as shown above) makes it a lot easier to get tested code back to you.

Contributor
Posts: 60

Re: Efficient way to pattern match strings in SAS

Yes thank you very much. I will post the whole code next time Smiley Happy

Didn't even know you can use sql join like that!

Super User
Super User
Posts: 7,970

Re: Efficient way to pattern match strings in SAS

One other option - and its more useful when its more complicated matching/processing, is to use the macth dataset to generate a dataset for the have e.g:

data _null_;
  set match end=last;
  if _n_=1 then cal' execute('data want;  set have;');
  call execute(cat(' if index(fvar,',strip(var),')>0 then output;'));
  if last then call execute(' run;');
run;

This will generate code of one if statement for each row of data in match, and that datastep will then run.

Contributor
Posts: 60

Re: Efficient way to pattern match strings in SAS

[ Edited ]

I tried your approach in practice (the first one, couldn't get the second one working even after correcting the call typo).

I run some problems. Here is the sample code:

data MATCH;
   length VAR $30;
   input VAR $;
infile datalines dlm=',';
datalines;
testing.test,
Does work,
Special % char
;
run;

data HAVE;
length FVAR $300;
   input FVAR $;
   infile datalines dlm=',';
   datalines;
testing.test the test code,
Does work or not,
Does not work or what,
Special % character in a string,
This is not included,
The test of testing.test
TESTING.TEST
;
run;

/*
proc sql;
create table WANT as
select A.*
from HAVE A
left join MATCH B
on index(A.FVAR,B.VAR)>0
where B.VAR is not null;
quit;
*/

proc sql;
create table WANT as
select A.*
from HAVE A
left join MATCH B
on index(strip(lowcase(A.FVAR)),strip(lowcase(B.VAR)))>0
where B.VAR is not null;
quit;

The result is not as intended. Do you have any idea why so?


EDIT: Ok there was a mistake in my example the old one is commented and edited one is live. However this does not solve my problem with the actual data... Don't know how I could find an example.

Contributor
Posts: 60

Re: Efficient way to pattern match strings in SAS

Ok, so found the answer why it didint work... For some unknown reason SAS adds trailing/leading blanks to the variables while executing the index fuction. So even if I  have strip lowcased the variables in their corresponding datasets, I will have to do it again while using the index (index(strip(lowcase(FVAR)),strip(lowcase(VAR)))) Don't know if this is somehow connected to the length set for the variables???

Super User
Super User
Posts: 7,970

Re: Efficient way to pattern match strings in SAS

Well, if the length isn't stipulated then it will default to 8 and padd out.  I always strip() variables just to be safe.

 

For the other method, sorry there were a few typos in there, this should work:

data MATCH;
   length VAR $30;
   input VAR $;
infile datalines dlm=',';
datalines;
testing.test,
Does work,
Special % char
;
run;

data HAVE;
length FVAR $300;
   input FVAR $;
   infile datalines dlm=',';
   datalines;
testing.test the test code,
Does work or not,
Does not work or what,
Special % character in a string,
This is not included,
The test of testing.test
TESTING.TEST
;
run;

data _null_;
  set match end=last;
  if _n_=1 then call execute('data want;  set have;');
  call execute(cat(' if index(fvar,"',strip(var),'")>0 then output;'));
  if last then call execute(' run;');
run;
Super User
Posts: 10,035

Re: Efficient way to pattern match strings in SAS

You are not doing some exact match, so check other distance function LIKE: spedis(), gendis(),complev() .......


data MATCH;
   length VAR $30;
   input VAR $;
infile datalines dlm=',';
datalines;
testing.test,
Does work,
Special % char
;
run;

data HAVE;
length FVAR $300;
   input FVAR $;
   infile datalines dlm=',';
   datalines;
testing.test the test code,
Does work or not,
Does not work or what,
Special % character in a string,
This is not included,
The test of testing.test
TESTING.TEST
;
run;

proc sql;
create table WANT as
select A.*,B.*
from HAVE A, MATCH B
group by B.VAR
having spedis(strip(lowcase(A.FVAR)),strip(lowcase(B.VAR)))=
  min(spedis(strip(lowcase(A.FVAR)),strip(lowcase(B.VAR))));
quit;




☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 423 views
  • 1 like
  • 3 in conversation