Hi,
I was wondering if there is any way to do efficient data filtering with pattern matching functions in SAS. What I mean by this is that I would give the function a variable (VAR) as an input parameter and the function would parse the data to see if any of the VAR values are contained in the destination variable string.
To clarify this with an example:
I hava a dataset (MATCH) which I maintain and it includes the desired patterns.
For example:
VAR
hi
hello
bye
And then in the other dataset (HAVE) I have a variable which is used as a filter criteria lets say FVAR
FVAR
hi all!
hello!
good night
good morning
bye bye
and my WANT dataset would be
FVAR
hi all!
hello!
bye bye
And just to point out my MATCH dataset has hundreds of rows so doing this my macro variables or hard coding is not an option.
There are many string comparison functions in SAS, you will find them listed here;
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245860.htm
You could also use Perl Regular expressions.
The simplest way to do what you are asking is like this:
data match; length var $200; input var $; datalines; hi hello bye ; run; data have; length fvar $200; input FVAR $; datalines; hi all! hello! good night good morning bye bye ; run; proc sql; create table WANT as select A.* from HAVE A left join MATCH B on index(A.FVAR,B.VAR) > 0 where B.VAR is not null; quit;
Do note, posting test data in the form of a datastep (as shown above) makes it a lot easier to get tested code back to you.
There are many string comparison functions in SAS, you will find them listed here;
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245860.htm
You could also use Perl Regular expressions.
The simplest way to do what you are asking is like this:
data match; length var $200; input var $; datalines; hi hello bye ; run; data have; length fvar $200; input FVAR $; datalines; hi all! hello! good night good morning bye bye ; run; proc sql; create table WANT as select A.* from HAVE A left join MATCH B on index(A.FVAR,B.VAR) > 0 where B.VAR is not null; quit;
Do note, posting test data in the form of a datastep (as shown above) makes it a lot easier to get tested code back to you.
Yes thank you very much. I will post the whole code next time 🙂
Didn't even know you can use sql join like that!
One other option - and its more useful when its more complicated matching/processing, is to use the macth dataset to generate a dataset for the have e.g:
data _null_; set match end=last; if _n_=1 then cal' execute('data want; set have;'); call execute(cat(' if index(fvar,',strip(var),')>0 then output;')); if last then call execute(' run;'); run;
This will generate code of one if statement for each row of data in match, and that datastep will then run.
I tried your approach in practice (the first one, couldn't get the second one working even after correcting the call typo).
I run some problems. Here is the sample code:
data MATCH;
length VAR $30;
input VAR $;
infile datalines dlm=',';
datalines;
testing.test,
Does work,
Special % char
;
run;
data HAVE;
length FVAR $300;
input FVAR $;
infile datalines dlm=',';
datalines;
testing.test the test code,
Does work or not,
Does not work or what,
Special % character in a string,
This is not included,
The test of testing.test
TESTING.TEST
;
run;
/*
proc sql;
create table WANT as
select A.*
from HAVE A
left join MATCH B
on index(A.FVAR,B.VAR)>0
where B.VAR is not null;
quit;
*/
proc sql;
create table WANT as
select A.*
from HAVE A
left join MATCH B
on index(strip(lowcase(A.FVAR)),strip(lowcase(B.VAR)))>0
where B.VAR is not null;
quit;
The result is not as intended. Do you have any idea why so?
EDIT: Ok there was a mistake in my example the old one is commented and edited one is live. However this does not solve my problem with the actual data... Don't know how I could find an example.
Ok, so found the answer why it didint work... For some unknown reason SAS adds trailing/leading blanks to the variables while executing the index fuction. So even if I have strip lowcased the variables in their corresponding datasets, I will have to do it again while using the index (index(strip(lowcase(FVAR)),strip(lowcase(VAR)))) Don't know if this is somehow connected to the length set for the variables???
Well, if the length isn't stipulated then it will default to 8 and padd out. I always strip() variables just to be safe.
For the other method, sorry there were a few typos in there, this should work:
data MATCH; length VAR $30; input VAR $; infile datalines dlm=','; datalines; testing.test, Does work, Special % char ; run; data HAVE; length FVAR $300; input FVAR $; infile datalines dlm=','; datalines; testing.test the test code, Does work or not, Does not work or what, Special % character in a string, This is not included, The test of testing.test TESTING.TEST ; run; data _null_; set match end=last; if _n_=1 then call execute('data want; set have;'); call execute(cat(' if index(fvar,"',strip(var),'")>0 then output;')); if last then call execute(' run;'); run;
You are not doing some exact match, so check other distance function LIKE: spedis(), gendis(),complev() ....... data MATCH; length VAR $30; input VAR $; infile datalines dlm=','; datalines; testing.test, Does work, Special % char ; run; data HAVE; length FVAR $300; input FVAR $; infile datalines dlm=','; datalines; testing.test the test code, Does work or not, Does not work or what, Special % character in a string, This is not included, The test of testing.test TESTING.TEST ; run; proc sql; create table WANT as select A.*,B.* from HAVE A, MATCH B group by B.VAR having spedis(strip(lowcase(A.FVAR)),strip(lowcase(B.VAR)))= min(spedis(strip(lowcase(A.FVAR)),strip(lowcase(B.VAR)))); quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.