BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BobHope
Quartz | Level 8

Hi,

 

I was wondering if there is any way to do efficient data filtering with pattern matching functions in SAS. What I mean by this is that I would give the function a variable (VAR) as an input parameter and the function would parse the data to see if any of the VAR values are contained in the destination variable string.

To clarify this with an example:

 

I hava a dataset (MATCH) which I maintain and it includes the desired patterns.

For example:

VAR

hi

hello

bye

 

And then in the other dataset (HAVE) I have a variable which is used as a filter criteria lets say FVAR

FVAR

hi all!

hello!

good night

good morning

bye bye

 

and my WANT dataset would be

FVAR

hi all!

hello!

bye bye

 

And just to point out my MATCH dataset has hundreds of rows so doing this my macro variables or hard coding is not an option.

 

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

There are many string comparison functions in SAS, you will find them listed here;

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245860.htm

 

You could also use Perl Regular expressions.

 

The simplest way to do what you are asking is like this:

data match;
  length var $200;
  input var $;
datalines;
hi
hello
bye
;
run;
data have;
  length fvar $200;
  input FVAR $;
datalines;
hi all!
hello!
good night
good morning
bye bye
;
run;

proc sql;
  create table WANT as
  select  A.*
  from    HAVE A
  left join MATCH B
  on      index(A.FVAR,B.VAR) > 0
  where   B.VAR is not null;
quit;

Do note, posting test data in the form of a datastep (as shown above) makes it a lot easier to get tested code back to you.

View solution in original post

7 REPLIES 7
RW9
Diamond | Level 26 RW9
Diamond | Level 26

There are many string comparison functions in SAS, you will find them listed here;

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245860.htm

 

You could also use Perl Regular expressions.

 

The simplest way to do what you are asking is like this:

data match;
  length var $200;
  input var $;
datalines;
hi
hello
bye
;
run;
data have;
  length fvar $200;
  input FVAR $;
datalines;
hi all!
hello!
good night
good morning
bye bye
;
run;

proc sql;
  create table WANT as
  select  A.*
  from    HAVE A
  left join MATCH B
  on      index(A.FVAR,B.VAR) > 0
  where   B.VAR is not null;
quit;

Do note, posting test data in the form of a datastep (as shown above) makes it a lot easier to get tested code back to you.

BobHope
Quartz | Level 8

Yes thank you very much. I will post the whole code next time 🙂

Didn't even know you can use sql join like that!

RW9
Diamond | Level 26 RW9
Diamond | Level 26

One other option - and its more useful when its more complicated matching/processing, is to use the macth dataset to generate a dataset for the have e.g:

data _null_;
  set match end=last;
  if _n_=1 then cal' execute('data want;  set have;');
  call execute(cat(' if index(fvar,',strip(var),')>0 then output;'));
  if last then call execute(' run;');
run;

This will generate code of one if statement for each row of data in match, and that datastep will then run.

BobHope
Quartz | Level 8

I tried your approach in practice (the first one, couldn't get the second one working even after correcting the call typo).

I run some problems. Here is the sample code:

data MATCH;
   length VAR $30;
   input VAR $;
infile datalines dlm=',';
datalines;
testing.test,
Does work,
Special % char
;
run;

data HAVE;
length FVAR $300;
   input FVAR $;
   infile datalines dlm=',';
   datalines;
testing.test the test code,
Does work or not,
Does not work or what,
Special % character in a string,
This is not included,
The test of testing.test
TESTING.TEST
;
run;

/*
proc sql;
create table WANT as
select A.*
from HAVE A
left join MATCH B
on index(A.FVAR,B.VAR)>0
where B.VAR is not null;
quit;
*/

proc sql;
create table WANT as
select A.*
from HAVE A
left join MATCH B
on index(strip(lowcase(A.FVAR)),strip(lowcase(B.VAR)))>0
where B.VAR is not null;
quit;

The result is not as intended. Do you have any idea why so?


EDIT: Ok there was a mistake in my example the old one is commented and edited one is live. However this does not solve my problem with the actual data... Don't know how I could find an example.

BobHope
Quartz | Level 8

Ok, so found the answer why it didint work... For some unknown reason SAS adds trailing/leading blanks to the variables while executing the index fuction. So even if I  have strip lowcased the variables in their corresponding datasets, I will have to do it again while using the index (index(strip(lowcase(FVAR)),strip(lowcase(VAR)))) Don't know if this is somehow connected to the length set for the variables???

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, if the length isn't stipulated then it will default to 8 and padd out.  I always strip() variables just to be safe.

 

For the other method, sorry there were a few typos in there, this should work:

data MATCH;
   length VAR $30;
   input VAR $;
infile datalines dlm=',';
datalines;
testing.test,
Does work,
Special % char
;
run;

data HAVE;
length FVAR $300;
   input FVAR $;
   infile datalines dlm=',';
   datalines;
testing.test the test code,
Does work or not,
Does not work or what,
Special % character in a string,
This is not included,
The test of testing.test
TESTING.TEST
;
run;

data _null_;
  set match end=last;
  if _n_=1 then call execute('data want;  set have;');
  call execute(cat(' if index(fvar,"',strip(var),'")>0 then output;'));
  if last then call execute(' run;');
run;
Ksharp
Super User
You are not doing some exact match, so check other distance function LIKE: spedis(), gendis(),complev() .......


data MATCH;
   length VAR $30;
   input VAR $;
infile datalines dlm=',';
datalines;
testing.test,
Does work,
Special % char
;
run;

data HAVE;
length FVAR $300;
   input FVAR $;
   infile datalines dlm=',';
   datalines;
testing.test the test code,
Does work or not,
Does not work or what,
Special % character in a string,
This is not included,
The test of testing.test
TESTING.TEST
;
run;

proc sql;
create table WANT as
select A.*,B.*
from HAVE A, MATCH B
group by B.VAR
having spedis(strip(lowcase(A.FVAR)),strip(lowcase(B.VAR)))=
  min(spedis(strip(lowcase(A.FVAR)),strip(lowcase(B.VAR))));
quit;




SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 6038 views
  • 1 like
  • 3 in conversation