SAS Programming

yanshuai · Posted 03-19-2022 05:16 PM

Hello,

Thank you all for contributing the community. I have learnt a lot since I joined.

Here's another issue I encountered. I have two datasets:

data HAVE;
   input name $;
   datalines; 
John
Diana
Diane
Sally
Doug
David
Dianna
;
run;

data KEYWORDS;
   input keywords $;
   datalines; 
ou
vi
oh
ck
;
run;

I want to search KEYWORDS.keywords in HAVE.name and leave the HAVE observations only containing the keywords.

AND also show the keywords used in the searching process.

Here is what I want

data want;
   input name $ keywords $;
   datalines; 
John oh
Doug ou
David vi
;
run;

Here is just an example, but my actual keywords are more than 600, so I cannot put them all in the code.

Would you please share some codes and thoughts, please?

Thank you so much!

PaigeMiller · Posted 03-19-2022 05:34 PM

This is probably going to be slow for very large data sets ... but it works

proc sql;
	create table want as select name,keywords 
		from have as a,keywords as b
		where find(a.name,b.keywords,'t')>0;
quit;

--
Paige Miller

yanshuai · Posted 03-20-2022 11:20 PM

Thanks for the code. It works well.

What if the keywords must be an individual word, instead of a part of the word, in the NAME

For example,

data HAVE;
   input name $char14.;;
   datalines; 
apple LLC
appleLLC
amazon Inc
amazonInc
apple Corp
appleCorp
amazonCorp
;
run;

data KEYWORDS;
   input keywords $;
   datalines; 
apple
amazon
;
run;

I want the KEYWORDS showing up in the NAME as an individual word (with space before or after), rather than within the NAME string as part of it.

data want;
   input name $char11. keywords $char10.;
   datalines; 
apple LLC  apple
apple Corp apple
amazon Inc amazon
;
run;

PaigeMiller · Posted 03-21-2022 05:59 AM

@yanshuai wrote:

Thanks for the code. It works well.

What if the keywords must be an individual word, instead of a part of the word, in the NAME

For example,
data HAVE;
   input name $char14.;;
   datalines; 
apple LLC
appleLLC
amazon Inc
amazonInc
apple Corp
appleCorp
amazonCorp
;
run;

data KEYWORDS;
   input keywords $;
   datalines; 
apple
amazon
;
run;
I want the KEYWORDS showing up in the NAME as an individual word (with space before or after), rather than within the NAME string as part of it.
data want;
   input name $char11. keywords $char10.;
   datalines; 
apple LLC  apple
apple Corp apple
amazon Inc amazon
;
run;

Seems like you didn't even try the code I provided with these different keywords. Please try it and you'll know the answer.

--
Paige Miller

yanshuai · Posted 03-21-2022 11:33 AM

Actually, I tried it for sure....I am more used to SQL so I tried your code first actually. But I still cannot search out the NAME containing individual KEYWORDS, as my updated reply shows.

Sajid01 · Posted 03-21-2022 10:02 AM

If you do not want. to have the names with keywords that are integral to the same, then those names have to filtered out. Better to do it before the Proc SQL step.

The updated code will be like this

data HAVE;
   input name $char14.;
 if (find(trim(name),' '))>0;
  datalines; 
apple LLC
appleLLC
amazon Inc
amazonInc
apple Corp
appleCorp
amazonCorp
;
run;

data KEYWORDS;
   input keywords $;
   datalines; 
apple
amazon
;
run;
proc sql;
	create table want as select name,keywords 
		from have as a,keywords as b
		where  find(a.name,b.keywords,'t')>0;
		;
quit;

The output will be

mkeintz · Posted 03-19-2022 07:06 PM

What do you want if a name contains more than 1 keyword?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

yanshuai · Posted 03-19-2022 10:51 PM

Then show them in two rows of observations.

Sajid01 · Posted 03-20-2022 06:55 PM

Hello @yanshuai
@PaigeMiller solutions does take care of the scenario with multiple occurrences of the keywords, is simple and comprehensible.
I have added more keywords like as follows

data HAVE;
   input name $;
   datalines; 
John
Diana
Diane
Sally
Doug
David
Dianna
;
data KEYWORDS;
   input keywords $;
   datalines; 
ou
vi
oh
ck
id
ia
an
;/* Following code by @PaigeMiller */
proc sql;
	create table want as select name,keywords 
		from have as a,keywords as b
		where find(a.name,b.keywords,'t')>0;
quit;

The output is like this

yanshuai · Posted 03-20-2022 11:22 PM

Thanks for the code.

What if the keywords must be an individual word, instead of a part of the word, in the NAME

For example,

data HAVE;
   input name $char14.;;
   datalines; 
Apple LLC
AppleLLC
Amazon Inc
AmazonInc
Apple Corp
AppleCorp
AmazonCorp
;
run;


data KEYWORDS;
   input keywords $;
   datalines; 
apple
amazon
;
run;

I want the KEYWORDS showing up in the NAME as an individual word (with space before or after), rather than within the NAME string as part of it.

data want;
   input name $char11. keywords $char10.;
   datalines; 
apple LLC  apple
apple Corp apple
amazon Inc amazon
;
run;

mkeintz · Posted 03-19-2022 07:26 PM

If you never want more than one match per name, then you can put the list of keyword values into an array of character values. Loop through the array looking for each sequential character expression in name. Stop when one is found, or you've exhausted the list.

data HAVE;
   input name $;
   datalines; 
John
Diana
Diane
Sally
Doug
David
Dianna
run;

data KEYWORDS;
   input keywords $;
   datalines; 
ou
vi
oh
ck
run;

proc sql noprint;
  select  quote(trim(keywords)) into :keywordlist separated by ',' from keywords;
quit;
%put &=keywordlist;

data want (drop=i);
  set have;
  array kw {&sqlobs} $2 _temporary_ (&keywordlist);
  do i=1 to dim(kw) until (find(name,trim(kw{i}))); end;
  if i<=&sqlobs then keyword=kw{i};
run;

The macrovar keywordlist created by proc sql in the above case is "ou","vi","oh","ck", which initializes values in the array kw. The macrovar SQLOBS is the number of entities generated by the select statement in proc sql.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

yanshuai · Posted 03-20-2022 11:24 PM

Thanks. I love this FIND function.

What if the keywords must be an individual word, instead of a part of the word, in the NAME

For example,

data HAVE;
   input name $char14.;;
   datalines; 
Apple LLC
AppleLLC
Amazon Inc
AmazonInc
Apple Corp
AppleCorp
AmazonCorp
;
run;


data KEYWORDS;
   input keywords $;
   datalines; 
apple
amazon
;
run;

I want the KEYWORDS showing up in the NAME as an individual word (with space before or after), rather than within the NAME string as part of it.

data want;
   input name $char11. keywords $char10.;
   datalines; 
apple LLC  apple
apple Corp apple
amazon Inc amazon
;
run;

Patrick · Posted 03-19-2022 07:30 PM

Below two options how this could work.

data HAVE;
   input name $;
   datalines; 
John
Diana
Diane
Sally
Doug
David
Dianna
;

data KEYWORDS;
   input keywords $;
   datalines; 
ou
vi
oh
ck
id
;

/* example 1: return first match */
data want_1(drop=_:);
  if _n_=1 then
    do;
      if 0 then set keywords(keep=keywords);
      dcl hash h1(dataset:'keywords');
      dcl hiter hh1('h1');
      h1.defineKey('keywords');
      h1.defineData('keywords');
      h1.defineDone();
    end;
  call missing(of _all_);

  set have;

  _rc = hh1.first();
  do while (_rc = 0);
    if find(name,keywords,'it') then 
      do;
        output;
        leave;
      end;
    _rc = hh1.next();
  end;
run;

/* example 2: return all matches */
data 
    want_2  (keep=row_id name match_flg)
    match_2 (keep=row_id keywords);
    ;
  if _n_=1 then
    do;
      if 0 then set keywords(keep=keywords);
      dcl hash h1(dataset:'keywords');
      dcl hiter hh1('h1');
      h1.defineKey('keywords');
      h1.defineData('keywords');
      h1.defineDone();
    end;
  call missing(of _all_);

  row_id=_n_;
  match_flg=0;
  set have;

  _rc = hh1.first();
  do while (_rc = 0);
    if find(name,keywords,'it') then 
      do;
        output match_2;
        match_flg=1;
      end;
    _rc = hh1.next();
  end;
  output want_2;
/*  if match_flg=1 then output want_2;*/
run;

proc sql;
  select 
    l.*,
    r.keywords
  from want_2 l left join match_2 r
    on l.row_id=r.row_id
  order by row_id
    ;
quit;

SQL report using above code:

mkeintz · Posted 03-20-2022 12:16 AM

To accomodate the possibility of multiple keyword matches for a single name, you have to search for every keyword for every incoming name:

data HAVE;
   input name $;
   datalines; 
John
Diana
Diane
Sally
Doug
David
Dianna
run;

data KEYWORDS;
   input keywords $;
   datalines; 
ou
vi
oh
ck
run;

proc sql noprint;
  select  quote(trim(keywords)) into :keywordlist separated by ',' from keywords;
quit;
%put &=keywordlist;

data want (drop=i);
  set have;
  array kw {&sqlobs} $2 _temporary_ (&keywordlist);
  do i=1 to dim(kw) ;
    keyword=kw{i};
    if find(name,trim(keyword)) then output;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Programming

How to extract the observations contains the another keyword variable and show the keywords matched

Re: How to extract the observations contains the another keyword variable and show the keywords matc

Re: How to extract the observations contains the another keyword variable and show the keywords matc

Re: How to extract the observations contains the another keyword variable and show the keywords matc

Re: How to extract the observations contains the another keyword variable and show the keywords matc

Re: How to extract the observations contains the another keyword variable and show the keywords matc

Re: How to extract the observations contains the another keyword variable and show the keywords matc

Re: How to extract the observations contains the another keyword variable and show the keywords matc

Re: How to extract the observations contains the another keyword variable and show the keywords matc

Re: How to extract the observations contains the another keyword variable and show the keywords matc

Re: How to extract the observations contains the another keyword variable and show the keywords matc

Re: How to extract the observations contains the another keyword variable and show the keywords matc

Re: How to extract the observations contains the another keyword variable and show the keywords matc

Re: How to extract the observations contains the another keyword variable and show the keywords matc

Follow Us

What is...

SAS Programming

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...