I have two datasets. One, called "search", has 10 observations with a variable called search_term containing different strings.
The other one, called "names", has 100 observations with a string variable called "name", each of which has a different drug name.
I would like to use the index function to search variable "name" for each of the 10 search terms in "search." To be specific, I don't want to merge name with search_term (which would require an exact match) - I merely want to create an indicator 1 or 0 if any of the 10 values of search_term can be found inside the string "name." For example, if "name" equals "BOTOX A" and if one of the 10 values of search_term is "BOTOX" , the indicator should be set to 1 (and similar if "name" equals "BOTOX B."
I'd appreciate any help with this. Thanks.
Hi chuakp
There is another posibility. You can join the two tables in Proc SQL, as shown in the following example. It uses the data sets provided by MichaelLarsen, and gives the same output.
I prefer the SQL way in this case, becaause it it is simple and easy to read compared to the data steps, especially the hash-step. But if the datasets are very large, i would expect data steps to be faster.
/* proc sql join on search_term found in name */
proc sql;
create table match as
select
t2.search_term,
t1.name,
t2.search_term ne '' as found
from names as t1 left join search as t2
on find(t1.name,t2.search_term,'it') > 0;
quit;
Show test data, in the form of a datastep. As such just example code here:
data _null_;
set smalldata end=last;
if _n_=1 then call execute('data want; set largedata;');
call execute(cats('if index(yourstring,',smallstring,') then result=1;'));
if last then call execute('run;');
run;
This will generate a datastep with an if for each row in small dataset.
2 examples:
1. using a hash map and iterator
2. using the point option in a set statement, that one is probably easier to understand.
data search;
length search_term $10;
input search_term;
cards;
botox
ibuprofen
paracetamol
;
data names;
length name $30;
infile cards truncover dlm=',';
informat name $char30.;
input name ;
cards;
botox b
ipren (ibuprofen)
pamol (paracetamol)
glp-1
insulin
;
run;
/* data step using a hash map and an iterator */
data match;
if _N_ = 1 then do;
length search_term $10;
declare hash hmap (dataset: 'work.search');
declare hiter search ('hmap');
rc = hmap.definekey('search_term');
rc = hmap.definedata('search_term');
call missing(search_term);
rc = hmap.definedone();
end;
set names;
found = 0; /* Search indicator, 0=no hit, 1=one or more hits */
rc = search.first();
do while (rc = 0);
if find(name,search_term,'it') > 0 then do;
/* Found the search_term in the name column */
found = 1;
rc = -1; * Stop searching;
end;
else rc = search.next();
end;
if found = 0 then search_term = '';
drop rc;
run;
/* data step using the point= option */
data match;
set names;
found = 0;
do i=1 to searchobs;
set search point=i nobs=searchobs;
if find(name,search_term,'it') > 0 then do;
found = 1;
leave; * Stop searching;
end;
end;
if found = 0 then search_term = '';
drop i ;
run;
Hi chuakp
There is another posibility. You can join the two tables in Proc SQL, as shown in the following example. It uses the data sets provided by MichaelLarsen, and gives the same output.
I prefer the SQL way in this case, becaause it it is simple and easy to read compared to the data steps, especially the hash-step. But if the datasets are very large, i would expect data steps to be faster.
/* proc sql join on search_term found in name */
proc sql;
create table match as
select
t2.search_term,
t1.name,
t2.search_term ne '' as found
from names as t1 left join search as t2
on find(t1.name,t2.search_term,'it') > 0;
quit;
Thanks so much - this worked. I also appreciated the suggestion regarding the hash tag approach.
@chuakp wrote:
I have two datasets. One, called "search", has 10 observations with a variable called search_term containing different strings.
If you mean by "different strings" that you need to match anyone of multiple values contained in a single variable then I suspect this search_term is built incorrectly for a good search. Which is why we ask for actual examples of your data.
Example: if your search_term="Apple Pear" then searching for search_term in a name string like "Apple Company" that you expect to match would generally fail because the entire search_term is not part of the name string.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.