BookmarkSubscribeRSS Feed
yanshuai
Quartz | Level 8

Hello,

Thank you all for contributing the community. I have learnt a lot since I joined.

Here's another issue I encountered. I have two datasets:

data HAVE;
   input name $;
   datalines; 
John
Diana
Diane
Sally
Doug
David
Dianna
;
run;

data KEYWORDS;
   input keywords $;
   datalines; 
ou
vi
oh
ck
;
run;

I want to search KEYWORDS.keywords in HAVE.name and leave the HAVE observations only containing the keywords.

AND also show the keywords used in the searching process.

 

Here is what I want

data want;
   input name $ keywords $;
   datalines; 
John oh
Doug ou
David vi
;
run;

Here is just an example, but my actual keywords are more than 600, so I cannot put them all in the code.

Would you please share some codes and thoughts, please?

 

Thank you so much!

13 REPLIES 13
PaigeMiller
Diamond | Level 26

This is probably going to be slow for very large data sets ... but it works

 

proc sql;
	create table want as select name,keywords 
		from have as a,keywords as b
		where find(a.name,b.keywords,'t')>0;
quit;
--
Paige Miller
yanshuai
Quartz | Level 8

Thanks for the code. It works well.

What if the keywords must be an individual word, instead of a part of the word, in the NAME

For example,

data HAVE;
   input name $char14.;;
   datalines; 
apple LLC
appleLLC
amazon Inc
amazonInc
apple Corp
appleCorp
amazonCorp
;
run;

data KEYWORDS;
   input keywords $;
   datalines; 
apple
amazon
;
run;

I want the KEYWORDS showing up in the NAME as an individual word (with space before or after), rather than within the NAME string as part of it.

data want;
   input name $char11. keywords $char10.;
   datalines; 
apple LLC  apple
apple Corp apple
amazon Inc amazon
;
run;
PaigeMiller
Diamond | Level 26

@yanshuai wrote:

Thanks for the code. It works well.

What if the keywords must be an individual word, instead of a part of the word, in the NAME

For example,

data HAVE;
   input name $char14.;;
   datalines; 
apple LLC
appleLLC
amazon Inc
amazonInc
apple Corp
appleCorp
amazonCorp
;
run;

data KEYWORDS;
   input keywords $;
   datalines; 
apple
amazon
;
run;

I want the KEYWORDS showing up in the NAME as an individual word (with space before or after), rather than within the NAME string as part of it.

data want;
   input name $char11. keywords $char10.;
   datalines; 
apple LLC  apple
apple Corp apple
amazon Inc amazon
;
run;

Seems like you didn't even try the code I provided with these different keywords. Please try it and you'll know the answer.

--
Paige Miller
yanshuai
Quartz | Level 8
Actually, I tried it for sure....I am more used to SQL so I tried your code first actually. But I still cannot search out the NAME containing individual KEYWORDS, as my updated reply shows.
Sajid01
Meteorite | Level 14

If you do not want. to have the names with keywords that are integral to the same, then those names have to filtered out. Better to do it before the Proc SQL step.

The updated code will be like this

data HAVE;
   input name $char14.;
 if (find(trim(name),' '))>0;
  datalines; 
apple LLC
appleLLC
amazon Inc
amazonInc
apple Corp
appleCorp
amazonCorp
;
run;

data KEYWORDS;
   input keywords $;
   datalines; 
apple
amazon
;
run;
proc sql;
	create table want as select name,keywords 
		from have as a,keywords as b
		where  find(a.name,b.keywords,'t')>0;
		;
quit;

The output will be

 

Sajid01_0-1647871357167.png

 

mkeintz
PROC Star

What do you want if a name contains more than 1 keyword?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
yanshuai
Quartz | Level 8
Then show them in two rows of observations.
Sajid01
Meteorite | Level 14

Hello @yanshuai 
@PaigeMiller solutions does take care of the scenario with multiple occurrences of the keywords, is simple and comprehensible.
I have added more keywords like as follows

data HAVE;
   input name $;
   datalines; 
John
Diana
Diane
Sally
Doug
David
Dianna
;
data KEYWORDS;
   input keywords $;
   datalines; 
ou
vi
oh
ck
id
ia
an
;
/* Following code by @PaigeMiller */ proc sql; create table want as select name,keywords from have as a,keywords as b where find(a.name,b.keywords,'t')>0; quit;

The output is like this

Sajid01_0-1647816828059.png

 

yanshuai
Quartz | Level 8

Thanks for the code.

What if the keywords must be an individual word, instead of a part of the word, in the NAME

For example,

data HAVE;
   input name $char14.;;
   datalines; 
Apple LLC
AppleLLC
Amazon Inc
AmazonInc
Apple Corp
AppleCorp
AmazonCorp
;
run;


data KEYWORDS;
   input keywords $;
   datalines; 
apple
amazon
;
run;

I want the KEYWORDS showing up in the NAME as an individual word (with space before or after), rather than within the NAME string as part of it.

 

data want;
   input name $char11. keywords $char10.;
   datalines; 
apple LLC  apple
apple Corp apple
amazon Inc amazon
;
run;
mkeintz
PROC Star

If you never want more than one match per name, then you can put the list of keyword values into an array of character values.  Loop through the array looking for each sequential character expression in name.  Stop when one is found, or you've exhausted the list.

 

 

data HAVE;
   input name $;
   datalines; 
John
Diana
Diane
Sally
Doug
David
Dianna
run;

data KEYWORDS;
   input keywords $;
   datalines; 
ou
vi
oh
ck
run;

proc sql noprint;
  select  quote(trim(keywords)) into :keywordlist separated by ',' from keywords;
quit;
%put &=keywordlist;

data want (drop=i);
  set have;
  array kw {&sqlobs} $2 _temporary_ (&keywordlist);
  do i=1 to dim(kw) until (find(name,trim(kw{i}))); end;
  if i<=&sqlobs then keyword=kw{i};
run;

 

 

The macrovar keywordlist created by proc sql in the above case is "ou","vi","oh","ck", which initializes values in the array kw.   The macrovar SQLOBS is the number of entities generated by the select statement in proc sql.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
yanshuai
Quartz | Level 8

Thanks. I love this FIND function.

What if the keywords must be an individual word, instead of a part of the word, in the NAME

For example,

data HAVE;
   input name $char14.;;
   datalines; 
Apple LLC
AppleLLC
Amazon Inc
AmazonInc
Apple Corp
AppleCorp
AmazonCorp
;
run;


data KEYWORDS;
   input keywords $;
   datalines; 
apple
amazon
;
run;

I want the KEYWORDS showing up in the NAME as an individual word (with space before or after), rather than within the NAME string as part of it.

data want;
   input name $char11. keywords $char10.;
   datalines; 
apple LLC  apple
apple Corp apple
amazon Inc amazon
;
run;
Patrick
Opal | Level 21

Below two options how this could work.

data HAVE;
   input name $;
   datalines; 
John
Diana
Diane
Sally
Doug
David
Dianna
;

data KEYWORDS;
   input keywords $;
   datalines; 
ou
vi
oh
ck
id
;

/* example 1: return first match */
data want_1(drop=_:);
  if _n_=1 then
    do;
      if 0 then set keywords(keep=keywords);
      dcl hash h1(dataset:'keywords');
      dcl hiter hh1('h1');
      h1.defineKey('keywords');
      h1.defineData('keywords');
      h1.defineDone();
    end;
  call missing(of _all_);

  set have;

  _rc = hh1.first();
  do while (_rc = 0);
    if find(name,keywords,'it') then 
      do;
        output;
        leave;
      end;
    _rc = hh1.next();
  end;
run;

/* example 2: return all matches */
data 
    want_2  (keep=row_id name match_flg)
    match_2 (keep=row_id keywords);
    ;
  if _n_=1 then
    do;
      if 0 then set keywords(keep=keywords);
      dcl hash h1(dataset:'keywords');
      dcl hiter hh1('h1');
      h1.defineKey('keywords');
      h1.defineData('keywords');
      h1.defineDone();
    end;
  call missing(of _all_);

  row_id=_n_;
  match_flg=0;
  set have;

  _rc = hh1.first();
  do while (_rc = 0);
    if find(name,keywords,'it') then 
      do;
        output match_2;
        match_flg=1;
      end;
    _rc = hh1.next();
  end;
  output want_2;
/*  if match_flg=1 then output want_2;*/
run;

proc sql;
  select 
    l.*,
    r.keywords
  from want_2 l left join match_2 r
    on l.row_id=r.row_id
  order by row_id
    ;
quit;

SQL report using above code:

Patrick_0-1647732817470.png

 

mkeintz
PROC Star

To accomodate the possibility of multiple keyword matches for a single name, you have to search for every keyword for every incoming name:

 

data HAVE;
   input name $;
   datalines; 
John
Diana
Diane
Sally
Doug
David
Dianna
run;

data KEYWORDS;
   input keywords $;
   datalines; 
ou
vi
oh
ck
run;

proc sql noprint;
  select  quote(trim(keywords)) into :keywordlist separated by ',' from keywords;
quit;
%put &=keywordlist;

data want (drop=i);
  set have;
  array kw {&sqlobs} $2 _temporary_ (&keywordlist);
  do i=1 to dim(kw) ;
    keyword=kw{i};
    if find(name,trim(keyword)) then output;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 3297 views
  • 5 likes
  • 5 in conversation