BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chuakp
Obsidian | Level 7

I have two datasets. One, called "search", has 10 observations with a variable called search_term containing different strings.

 

The other one, called "names", has 100 observations with a string variable called "name", each of which has a different drug name.

 

I would like to use the index function to search variable "name" for each of the 10 search terms in "search." To be specific, I don't want to merge name with search_term (which would require an exact match) - I merely want to create an indicator 1 or 0 if any of the 10 values of search_term can be found inside the string "name." For example, if "name" equals "BOTOX A" and if one of the 10 values of search_term is "BOTOX" , the indicator should be set to 1 (and similar if "name" equals "BOTOX B." 

 

I'd appreciate any help with this. Thanks.  

1 ACCEPTED SOLUTION

Accepted Solutions
ErikLund_Jensen
Rhodochrosite | Level 12

Hi chuakp

 

There is another posibility. You can join the two tables in Proc SQL, as shown in the following example. It uses the data sets provided by MichaelLarsen, and gives the same output.

 

I prefer the SQL way in this case, becaause it it is simple and easy to read compared to the data steps, especially the hash-step. But if the datasets are very large, i would expect data steps to be faster.

/* proc sql join on search_term found in name */
proc sql;
	create table match as
		select 
			t2.search_term,
			t1.name,
			t2.search_term ne '' as found
		from names as t1 left join search as t2
		on find(t1.name,t2.search_term,'it') > 0;
quit;
			

View solution in original post

5 REPLIES 5
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Show test data, in the form of a datastep.  As such just example code here:

data _null_;
  set smalldata end=last;
  if _n_=1 then call execute('data want; set largedata;');
  call execute(cats('if index(yourstring,',smallstring,') then result=1;'));
  if last then call execute('run;');
run;  

This will generate a datastep with an if for each row in small dataset.

MichaelLarsen
SAS Employee

2 examples:

1. using a hash map and iterator

2. using the point option in a set statement, that one is probably easier to understand.

 

data search;
  length search_term $10;
  input search_term;
cards;
botox
ibuprofen
paracetamol
;
data names;
  length name $30;
  infile cards truncover dlm=',';
  informat name $char30.;
  input name ;
cards;
botox b
ipren (ibuprofen)
pamol (paracetamol)
glp-1
insulin
;
run;

/* data step using a hash map and an iterator */
data match;
  if _N_ = 1 then do;
    length search_term $10;
    declare hash hmap (dataset: 'work.search');
    declare hiter search ('hmap');
    rc = hmap.definekey('search_term');
    rc = hmap.definedata('search_term');
    call missing(search_term);
    rc = hmap.definedone();
  end;
  set names;
  found = 0; /* Search indicator, 0=no hit, 1=one or more hits */
  rc = search.first();
  do while (rc = 0);
    if find(name,search_term,'it') > 0 then do;
      /* Found the search_term in the name column */
      found = 1;
      rc = -1; * Stop searching;
    end;
    else rc = search.next();
  end;
  if found = 0 then search_term = '';
  drop rc;
run;

/* data step using the point= option */
data match;
  set names;
  found = 0;
  do i=1 to searchobs;
    set search point=i nobs=searchobs;
    if find(name,search_term,'it') > 0 then do;
      found = 1;
      leave; * Stop searching;
    end;
  end;
  if found = 0 then search_term = '';
  drop i ;
run;

 

ErikLund_Jensen
Rhodochrosite | Level 12

Hi chuakp

 

There is another posibility. You can join the two tables in Proc SQL, as shown in the following example. It uses the data sets provided by MichaelLarsen, and gives the same output.

 

I prefer the SQL way in this case, becaause it it is simple and easy to read compared to the data steps, especially the hash-step. But if the datasets are very large, i would expect data steps to be faster.

/* proc sql join on search_term found in name */
proc sql;
	create table match as
		select 
			t2.search_term,
			t1.name,
			t2.search_term ne '' as found
		from names as t1 left join search as t2
		on find(t1.name,t2.search_term,'it') > 0;
quit;
			
chuakp
Obsidian | Level 7

Thanks so much - this worked. I also appreciated the suggestion regarding the hash tag approach.

ballardw
Super User

@chuakp wrote:

I have two datasets. One, called "search", has 10 observations with a variable called search_term containing different strings.

 

If you mean by "different strings" that you need to match anyone of multiple values contained in a single variable then I suspect this search_term is built incorrectly for a good search. Which is why we ask for actual examples of your data.

Example: if your search_term="Apple Pear" then searching for search_term in a name string like "Apple Company" that you expect to match would generally fail because the entire search_term is not part of the name string.

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 4163 views
  • 3 likes
  • 5 in conversation