Solved: Re: Compare two coloums

Neptun83 · Posted 01-26-2024 09:13 AM

Hello!
I have two tables with two columns.
If the character string of table1 column contains the word in table2 coloum,then I want to delete it from the character string of table1 column.

For example:
Table 1 column: Table two column:
Next day. Day
Best day. Great
Great day test.

The result I would like:
Next
Best
Test

What is the best method to do this?

Tom · Posted 01-26-2024 09:46 AM

Not sure what you mean. Your presentation of the data is not clear.

Do you mean you have two datasets that each have two variables? Or does each dataset have only one variable?

If you have two separate datasets then how do you want to match the observations? Do you want to paste them side by side so the first observations from dataset ONE is paired with the first observation from dataset TWO?

Does the case of the letters in the strings matter?

Do the substrings have to be complete words in the full strings? If so what characters can be used to indicate word boundaries?

So let's assume case is to be ignored and only spaces demark word boundaries.

First let's create some sample datasets.

data strings;
  input string $50.;
cards;
Next day
Best day
Great day test
;

data words;
  input word $20. ;
cards;
Day
Great
;

If the list of words is actually so short then just hard code them.

data want;
  set strings;
  length new_string $50;
  do i=1 to countw(string,' ');
      word=scan(string,i,' ');
      if upcase(word) not in ('GREAT','DAY') then new_string=catx(' ',new_string,word);
  end;
  drop i word;
run;

Result

                          new_
Obs    string            string

 1     Next day           Next
 2     Best day           Best
 3     Great day test     test

If the list of words is short enough to fit into a single character variable (32k bytes) then you might want to use a regular expression instead. Perhaps something like:

data want;
  set strings;
  length new_string $50 ;
  new_string=compbl(prxchange("s/\b(great|day)\b/ /i",-1,string));
run;

Then you could use a simple macro variable to build the part of the regular expression that changes.

proc sql noprint;
  select word into :wordlist separated by '|' 
    from words
  ;
quit;
data want;
  set strings;
  length new_string $50 ;
  new_string=compbl(prxchange("s/\b(&wordlist)\b/ /i",-1,string));
run;

But if you have a lot of substrings to remove you will need to go back to the DO loop method.

So perhaps by putting the list of words into an array?

data want;
  set strings;
  if _n_=1 then do i=1 to nobs;
     set words nobs=nobs;
     array words[5000] $20 _temporary_ ;
     words[i]=upcase(word);
  end;
  length new_string $50 ;
  do i=1 to countw(string,' ');
    word=scan(string,i,' ');
    if not (upcase(word) in words) then new_string=catx(' ',new_string,word);
  end;
  drop i word;
run;

View solution in original post

Neptun83 · Posted 01-26-2024 09:14 AM

I use SAS EG 8.2

Tom · Posted 01-26-2024 10:09 AM

@Neptun83 wrote:
I use SAS EG 8.2

Note: The version of the tool you use to create and submit SAS code is not that important for a programming question. What is more important is the version of SAS that you are using Enterprise Guide to connect to.

But your question is basic enough that the answer should work on any currently used version of SAS.

Tom · Posted 01-26-2024 09:46 AM

Not sure what you mean. Your presentation of the data is not clear.

Do you mean you have two datasets that each have two variables? Or does each dataset have only one variable?

If you have two separate datasets then how do you want to match the observations? Do you want to paste them side by side so the first observations from dataset ONE is paired with the first observation from dataset TWO?

Does the case of the letters in the strings matter?

Do the substrings have to be complete words in the full strings? If so what characters can be used to indicate word boundaries?

So let's assume case is to be ignored and only spaces demark word boundaries.

First let's create some sample datasets.

data strings;
  input string $50.;
cards;
Next day
Best day
Great day test
;

data words;
  input word $20. ;
cards;
Day
Great
;

If the list of words is actually so short then just hard code them.

data want;
  set strings;
  length new_string $50;
  do i=1 to countw(string,' ');
      word=scan(string,i,' ');
      if upcase(word) not in ('GREAT','DAY') then new_string=catx(' ',new_string,word);
  end;
  drop i word;
run;

Result

                          new_
Obs    string            string

 1     Next day           Next
 2     Best day           Best
 3     Great day test     test

If the list of words is short enough to fit into a single character variable (32k bytes) then you might want to use a regular expression instead. Perhaps something like:

data want;
  set strings;
  length new_string $50 ;
  new_string=compbl(prxchange("s/\b(great|day)\b/ /i",-1,string));
run;

Then you could use a simple macro variable to build the part of the regular expression that changes.

proc sql noprint;
  select word into :wordlist separated by '|' 
    from words
  ;
quit;
data want;
  set strings;
  length new_string $50 ;
  new_string=compbl(prxchange("s/\b(&wordlist)\b/ /i",-1,string));
run;

But if you have a lot of substrings to remove you will need to go back to the DO loop method.

So perhaps by putting the list of words into an array?

data want;
  set strings;
  if _n_=1 then do i=1 to nobs;
     set words nobs=nobs;
     array words[5000] $20 _temporary_ ;
     words[i]=upcase(word);
  end;
  length new_string $50 ;
  do i=1 to countw(string,' ');
    word=scan(string,i,' ');
    if not (upcase(word) in words) then new_string=catx(' ',new_string,word);
  end;
  drop i word;
run;

Neptun83 · Posted 01-26-2024 01:22 PM

You understood perfectly! Do loop is the best for me. I learned a lot from this code.

Thank you!

One more question 🙂 How can I check the cases where there are no spaces in the string?

For example : Greatdaytest

Compare two coloums

Re: Compare two coloums

Re: Compare two coloums

Re: Compare two coloums

Re: Compare two coloums

Re: Compare two coloums

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!