BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
keen_sas
Quartz | Level 8

Hi ,


I have repeated sub strings/duplicate words in a string . I would like to remove the repeated words from the string and
keep only the unique values in the string.Words are mix of cases. Any shortcut approach to do this. i am trying to do this using arrays and do loop .

Have Want
apple orange kiwi apple grapes strawberry peach kiwi peach apple orange kiwi grapes strawberry peach
USA UK Australia Japan USA UK USA UK Australia Japan
FOOTBALL BasketBall basketball Hockey football  Football basketball hockey
1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20
data have;
input string :$200.;
infile datalines dlm=',';
datalines;
apple orange kiwi apple grapes strawberry peach kiwi peach
USA UK Australia Japan USA UK
FOOTBALL BasketBall basketball Hockey football
;

data want(keep=string newstring);
   set have;
   newstring=scan(string, 1, ' ');
   do i=2 to countw(string,' ');
      word=scan(string, i, ' ');
      found=find(newstring, word, 'it');
      if found=0 then newstring=catx(' ', newstring, word);
   end;
run;

proc print data=want;
run;

This gives you

 

Capture.PNG

View solution in original post

9 REPLIES 9
Ksharp
Super User
data have;
input x $80.;
cards;
apple orange kiwi apple grapes strawberry peach kiwi peach
USA UK Australia Japan USA UK
FOOTBALL BasketBall basketball Hockey football 
;
run;
data want;
 set have;
 array a{999} $ 100;
 n=0;
 do i=1 to countw(x,' ');
  temp=lowcase(scan(x,i,' '));
  if temp not in a then do;	n+1;a{n}=temp;end;
 end;
 want=catx(' ',of a{*});
 drop n i a: temp;
run;
PeterClemmensen
Tourmaline | Level 20
data have;
input string :$200.;
infile datalines dlm=',';
datalines;
apple orange kiwi apple grapes strawberry peach kiwi peach
USA UK Australia Japan USA UK
FOOTBALL BasketBall basketball Hockey football
;

data want(keep=string newstring);
   set have;
   newstring=scan(string, 1, ' ');
   do i=2 to countw(string,' ');
      word=scan(string, i, ' ');
      found=find(newstring, word, 'it');
      if found=0 then newstring=catx(' ', newstring, word);
   end;
run;

proc print data=want;
run;

This gives you

 

Capture.PNG

Ksharp
Super User

I think you should use FINDW(), would be better.

keen_sas
Quartz | Level 8

The below code is sometimes excluding the  substring with a minor difference in the spelling . Tried with both FIND and FINDW but it is not excluding the exact matching word. In the below example Japan and Japanese are two different words but when using find function JAPAN is getting excluded because of JAPANESE , similarly in the last row when both FATESTCD and FATEST are present only FATESTCD is present , FATEST is getting excluded. Only the exact word should get removed ,but its not happening here.Any suggestions/updates on this.

How can we try this using PRXMATCH,?

 

data have;
input string :$200.;
infile datalines dlm=',';
datalines;
apple orange kiwi apple grapes strawberry peach kiwi peach
China USA UK Australia Japanese USA UK Australian Japan Chinase
FOOTBALL BasketBall basketball Hockey football
FACAT FATESTCD FATEST FAOBJ STDT STDTC VISIT VISITNUM
;

data want(keep=string newstring);
   set have;
   newstring=scan(string, 1, ' ');
   do i=2 to countw(string,' ');
      word=scan(string, i, ' ');
      found=find(newstring, word, 'it');
*found=findw(newstring, word, 'it');; if found=0 then newstring=catx(' ', newstring, word); end; run;

 

 

PGStats
Opal | Level 21

With perl regular expression matching:

 

data have;
input str $80.;
cards;
apple orange kiwi apple grapes strawberry peach kiwi peach
USA UK Australia Japan USA UK
FOOTBALL BasketBall basketball Hockey football 
;

data want;
if not prxId then prxId + prxParse("s/\b(\w{2,})\b(.*)\b(\1\s*)\b/\1\2/io");
set have;
wantStr = str;
do i = 1 to 100 until (times=0);
    call prxChange(prxId,1,wantStr,wantStr,len,trunc,times);
    end;
drop i prxId len trunc times;
run;

proc print; run;
         Obs    str

          1     apple orange kiwi apple grapes strawberry peach kiwi peach
          2     USA UK Australia Japan USA UK
          3     FOOTBALL BasketBall basketball Hockey football

         Obs    wantStr

          1     apple orange kiwi grapes strawberry peach
          2     USA UK Australia Japan
          3     FOOTBALL BasketBall Hockey
PG
keen_sas
Quartz | Level 8

Thank you all for your responses, gained some new approach of programming.

Amir
PROC Star

You asked an interesting question and got some interesting answers.

 

You can help others further by marking one of the provided answers as the solution (not this one).

 

Regards,

Amir.

dee7
Calcite | Level 5

I am hoping someone can help with this. I have a similar problem and the code posted her worked but I have over 20 variables with this sort of repeated strings. Do I have to do this over and over for all 20+ variables? Does anyone know any shorter way to do this for multiple variables at once? Can we use this in an array to achieve the same result for many variables? 

Amir
PROC Star

@dee7  Welcome to SAS Communities. I recommend you ask a new question, supplying the data you have and the data you want in the form of SAS data steps using the datalines statement to supply the data in each case. More people will see the question and so it should get answered more quickly and with a wider range of solutions.

 

I expect an answer could involve using one of the dictionary tables and macro variables.

 

Feel free to provide a link to this question also, if desired, but all necessary information should ideally be in your question.

 

 

Regards,

Amir.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 10264 views
  • 9 likes
  • 6 in conversation