BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Neptun83
Calcite | Level 5
Hello!
I have two tables with two columns.
If the character string of table1 column contains the word in table2 coloum,then I want to delete it from the character string of table1 column.

For example:
Table 1 column: Table two column:
Next day. Day
Best day. Great
Great day test.

The result I would like:
Next
Best
Test


What is the best method to do this?

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Not sure what you mean. Your presentation of the data is not clear.

 

Do you mean you have two datasets that each have two variables? Or does each dataset have only one variable?

 

If you have two separate datasets then how do you want to match the observations?  Do you want to paste them side by side so the first observations from dataset ONE is paired with the first observation from dataset TWO?

 

Does the case of the letters in the strings matter?

 

Do the substrings have to be complete words in the full strings?  If so what characters can be used to indicate word boundaries?

 

So let's assume case is to be ignored and only spaces demark word boundaries.

 

First let's create some sample datasets.

 

data strings;
  input string $50.;
cards;
Next day
Best day
Great day test
;

data words;
  input word $20. ;
cards;
Day
Great
;

If the list of words is actually so short then just hard code them.

 

 

data want;
  set strings;
  length new_string $50;
  do i=1 to countw(string,' ');
      word=scan(string,i,' ');
      if upcase(word) not in ('GREAT','DAY') then new_string=catx(' ',new_string,word);
  end;
  drop i word;
run;

Result

 

                          new_
Obs    string            string

 1     Next day           Next
 2     Best day           Best
 3     Great day test     test

If the list of words is short enough to fit into a single character variable (32k bytes) then you might want to use a regular expression instead.  Perhaps something like:

data want;
  set strings;
  length new_string $50 ;
  new_string=compbl(prxchange("s/\b(great|day)\b/ /i",-1,string));
run;

Then you could use a simple macro variable to build the part of the regular expression that changes.

proc sql noprint;
  select word into :wordlist separated by '|' 
    from words
  ;
quit;
data want;
  set strings;
  length new_string $50 ;
  new_string=compbl(prxchange("s/\b(&wordlist)\b/ /i",-1,string));
run;

But if you have a lot of substrings to remove you will need to go back to the DO loop method. 

So perhaps by putting the list of words into an array?

data want;
  set strings;
  if _n_=1 then do i=1 to nobs;
     set words nobs=nobs;
     array words[5000] $20 _temporary_ ;
     words[i]=upcase(word);
  end;
  length new_string $50 ;
  do i=1 to countw(string,' ');
    word=scan(string,i,' ');
    if not (upcase(word) in words) then new_string=catx(' ',new_string,word);
  end;
  drop i word;
run;

 

 

View solution in original post

4 REPLIES 4
Neptun83
Calcite | Level 5
I use SAS EG 8.2
Tom
Super User Tom
Super User

@Neptun83 wrote:
I use SAS EG 8.2

Note:  The version of the tool you use to create and submit SAS code is not that important for a programming question.  What is more important is the version of SAS that you are using Enterprise Guide to connect to. 

 

But your question is basic enough that the answer should work on any currently used version of SAS.

Tom
Super User Tom
Super User

Not sure what you mean. Your presentation of the data is not clear.

 

Do you mean you have two datasets that each have two variables? Or does each dataset have only one variable?

 

If you have two separate datasets then how do you want to match the observations?  Do you want to paste them side by side so the first observations from dataset ONE is paired with the first observation from dataset TWO?

 

Does the case of the letters in the strings matter?

 

Do the substrings have to be complete words in the full strings?  If so what characters can be used to indicate word boundaries?

 

So let's assume case is to be ignored and only spaces demark word boundaries.

 

First let's create some sample datasets.

 

data strings;
  input string $50.;
cards;
Next day
Best day
Great day test
;

data words;
  input word $20. ;
cards;
Day
Great
;

If the list of words is actually so short then just hard code them.

 

 

data want;
  set strings;
  length new_string $50;
  do i=1 to countw(string,' ');
      word=scan(string,i,' ');
      if upcase(word) not in ('GREAT','DAY') then new_string=catx(' ',new_string,word);
  end;
  drop i word;
run;

Result

 

                          new_
Obs    string            string

 1     Next day           Next
 2     Best day           Best
 3     Great day test     test

If the list of words is short enough to fit into a single character variable (32k bytes) then you might want to use a regular expression instead.  Perhaps something like:

data want;
  set strings;
  length new_string $50 ;
  new_string=compbl(prxchange("s/\b(great|day)\b/ /i",-1,string));
run;

Then you could use a simple macro variable to build the part of the regular expression that changes.

proc sql noprint;
  select word into :wordlist separated by '|' 
    from words
  ;
quit;
data want;
  set strings;
  length new_string $50 ;
  new_string=compbl(prxchange("s/\b(&wordlist)\b/ /i",-1,string));
run;

But if you have a lot of substrings to remove you will need to go back to the DO loop method. 

So perhaps by putting the list of words into an array?

data want;
  set strings;
  if _n_=1 then do i=1 to nobs;
     set words nobs=nobs;
     array words[5000] $20 _temporary_ ;
     words[i]=upcase(word);
  end;
  length new_string $50 ;
  do i=1 to countw(string,' ');
    word=scan(string,i,' ');
    if not (upcase(word) in words) then new_string=catx(' ',new_string,word);
  end;
  drop i word;
run;

 

 

Neptun83
Calcite | Level 5

You understood perfectly! Do loop is the best for me. I learned a lot from this code. 

Thank you! 

 

One more question 🙂 How can I check the cases where there are no spaces in the string?

For example : Greatdaytest

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 637 views
  • 0 likes
  • 2 in conversation