Hello,
Thank you all for contributing the community. I have learnt a lot since I joined.
Here's another issue I encountered. I have two datasets:
data HAVE; input name $; datalines; John Diana Diane Sally Doug David Dianna ; run; data KEYWORDS; input keywords $; datalines; ou vi oh ck ; run;
I want to search KEYWORDS.keywords in HAVE.name and leave the HAVE observations only containing the keywords.
AND also show the keywords used in the searching process.
Here is what I want
data want; input name $ keywords $; datalines; John oh Doug ou David vi ; run;
Here is just an example, but my actual keywords are more than 600, so I cannot put them all in the code.
Would you please share some codes and thoughts, please?
Thank you so much!
This is probably going to be slow for very large data sets ... but it works
proc sql;
create table want as select name,keywords
from have as a,keywords as b
where find(a.name,b.keywords,'t')>0;
quit;
Thanks for the code. It works well.
What if the keywords must be an individual word, instead of a part of the word, in the NAME
For example,
data HAVE; input name $char14.;; datalines; apple LLC appleLLC amazon Inc amazonInc apple Corp appleCorp amazonCorp ; run; data KEYWORDS; input keywords $; datalines; apple amazon ; run;
I want the KEYWORDS showing up in the NAME as an individual word (with space before or after), rather than within the NAME string as part of it.
data want; input name $char11. keywords $char10.; datalines; apple LLC apple apple Corp apple amazon Inc amazon ; run;
@yanshuai wrote:
Thanks for the code. It works well.
What if the keywords must be an individual word, instead of a part of the word, in the NAME
For example,
data HAVE; input name $char14.;; datalines; apple LLC appleLLC amazon Inc amazonInc apple Corp appleCorp amazonCorp ; run; data KEYWORDS; input keywords $; datalines; apple amazon ; run;I want the KEYWORDS showing up in the NAME as an individual word (with space before or after), rather than within the NAME string as part of it.
data want; input name $char11. keywords $char10.; datalines; apple LLC apple apple Corp apple amazon Inc amazon ; run;
Seems like you didn't even try the code I provided with these different keywords. Please try it and you'll know the answer.
If you do not want. to have the names with keywords that are integral to the same, then those names have to filtered out. Better to do it before the Proc SQL step.
The updated code will be like this
data HAVE;
input name $char14.;
if (find(trim(name),' '))>0;
datalines;
apple LLC
appleLLC
amazon Inc
amazonInc
apple Corp
appleCorp
amazonCorp
;
run;
data KEYWORDS;
input keywords $;
datalines;
apple
amazon
;
run;
proc sql;
create table want as select name,keywords
from have as a,keywords as b
where find(a.name,b.keywords,'t')>0;
;
quit;
The output will be
What do you want if a name contains more than 1 keyword?
Hello @yanshuai
@PaigeMiller solutions does take care of the scenario with multiple occurrences of the keywords, is simple and comprehensible.
I have added more keywords like as follows
data HAVE;
input name $;
datalines;
John
Diana
Diane
Sally
Doug
David
Dianna
;
data KEYWORDS;
input keywords $;
datalines;
ou
vi
oh
ck
id
ia
an
;
/* Following code by @PaigeMiller */
proc sql;
create table want as select name,keywords
from have as a,keywords as b
where find(a.name,b.keywords,'t')>0;
quit;
The output is like this
Thanks for the code.
What if the keywords must be an individual word, instead of a part of the word, in the NAME
For example,
data HAVE; input name $char14.;; datalines; Apple LLC AppleLLC Amazon Inc AmazonInc Apple Corp AppleCorp AmazonCorp ; run; data KEYWORDS; input keywords $; datalines; apple amazon ; run;
I want the KEYWORDS showing up in the NAME as an individual word (with space before or after), rather than within the NAME string as part of it.
data want; input name $char11. keywords $char10.; datalines; apple LLC apple apple Corp apple amazon Inc amazon ; run;
If you never want more than one match per name, then you can put the list of keyword values into an array of character values. Loop through the array looking for each sequential character expression in name. Stop when one is found, or you've exhausted the list.
data HAVE;
input name $;
datalines;
John
Diana
Diane
Sally
Doug
David
Dianna
run;
data KEYWORDS;
input keywords $;
datalines;
ou
vi
oh
ck
run;
proc sql noprint;
select quote(trim(keywords)) into :keywordlist separated by ',' from keywords;
quit;
%put &=keywordlist;
data want (drop=i);
set have;
array kw {&sqlobs} $2 _temporary_ (&keywordlist);
do i=1 to dim(kw) until (find(name,trim(kw{i}))); end;
if i<=&sqlobs then keyword=kw{i};
run;
The macrovar keywordlist created by proc sql in the above case is "ou","vi","oh","ck", which initializes values in the array kw. The macrovar SQLOBS is the number of entities generated by the select statement in proc sql.
Thanks. I love this FIND function.
What if the keywords must be an individual word, instead of a part of the word, in the NAME
For example,
data HAVE;
input name $char14.;;
datalines;
Apple LLC
AppleLLC
Amazon Inc
AmazonInc
Apple Corp
AppleCorp
AmazonCorp
;
run;
data KEYWORDS;
input keywords $;
datalines;
apple
amazon
;
run;
I want the KEYWORDS showing up in the NAME as an individual word (with space before or after), rather than within the NAME string as part of it.
data want; input name $char11. keywords $char10.; datalines; apple LLC apple apple Corp apple amazon Inc amazon ; run;
Below two options how this could work.
data HAVE;
input name $;
datalines;
John
Diana
Diane
Sally
Doug
David
Dianna
;
data KEYWORDS;
input keywords $;
datalines;
ou
vi
oh
ck
id
;
/* example 1: return first match */
data want_1(drop=_:);
if _n_=1 then
do;
if 0 then set keywords(keep=keywords);
dcl hash h1(dataset:'keywords');
dcl hiter hh1('h1');
h1.defineKey('keywords');
h1.defineData('keywords');
h1.defineDone();
end;
call missing(of _all_);
set have;
_rc = hh1.first();
do while (_rc = 0);
if find(name,keywords,'it') then
do;
output;
leave;
end;
_rc = hh1.next();
end;
run;
/* example 2: return all matches */
data
want_2 (keep=row_id name match_flg)
match_2 (keep=row_id keywords);
;
if _n_=1 then
do;
if 0 then set keywords(keep=keywords);
dcl hash h1(dataset:'keywords');
dcl hiter hh1('h1');
h1.defineKey('keywords');
h1.defineData('keywords');
h1.defineDone();
end;
call missing(of _all_);
row_id=_n_;
match_flg=0;
set have;
_rc = hh1.first();
do while (_rc = 0);
if find(name,keywords,'it') then
do;
output match_2;
match_flg=1;
end;
_rc = hh1.next();
end;
output want_2;
/* if match_flg=1 then output want_2;*/
run;
proc sql;
select
l.*,
r.keywords
from want_2 l left join match_2 r
on l.row_id=r.row_id
order by row_id
;
quit;
SQL report using above code:
To accomodate the possibility of multiple keyword matches for a single name, you have to search for every keyword for every incoming name:
data HAVE;
input name $;
datalines;
John
Diana
Diane
Sally
Doug
David
Dianna
run;
data KEYWORDS;
input keywords $;
datalines;
ou
vi
oh
ck
run;
proc sql noprint;
select quote(trim(keywords)) into :keywordlist separated by ',' from keywords;
quit;
%put &=keywordlist;
data want (drop=i);
set have;
array kw {&sqlobs} $2 _temporary_ (&keywordlist);
do i=1 to dim(kw) ;
keyword=kw{i};
if find(name,trim(keyword)) then output;
end;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.