BookmarkSubscribeRSS Feed
Guptashwe
Calcite | Level 5

I have a data set with two variables Enrolid and Index_Therapy. Under index_therapy i have data with duplicate values in the string for each record, I would need a code to remove duplicates from each string.

 

Sample data set is attached.

sample data.PNG

 

 

Output data set

Enrolid        Index_Therapy

114101       Sorafenib,Erlotinib

2086103     Sorafenib

102950901 Sorafenib

117102901 Sorafenib,Radio    etc

3 REPLIES 3
Kurt_Bremser
Super User

Create a new string, and check if a word is already present in it:

data want;
set problem_1 (rename=(index_therapy=_index_therapy));
length index_therapy $400;
do i = 1 to countw(_index_therapy,',');
  word = scan(_index_therapy,i,',');
  if not findw(index_therapy,strip(word),', ')
  then index_therapy = catx(',',index_therapy,word);
end;
keep enrolid index_therapy;
run;

 

 

Note that "Need Help" does not meet the requirements for a descriptive subject line. Be a little more creative next time.

singhsahab
Lapis Lazuli | Level 10

Hello,

 

Here is an alternate solution..

 

data have;
input Enrolid Index_Therapy $40.;
datalines;
114101 Sorafenib,Sorafenib,Sorafenib,Erlotinib
2086103 Sorafenib,Sorafenib,Sorafenib
102950901 Sorafenib,Sorafenib
117102901 Sorafenib,Radio,Radio
;
run;

DATA HAVE1;
SET HAVE;
DO I=1 TO COUNT(Index_Therapy,',')+1;
 Index_Therapy_=SCAN(Index_Therapy,I,',');
 OUTPUT;
END;
DROP I Index_Therapy;
RUN; 
 
PROC SORT DATA=HAVE1 OUT=HAVE2 NODUP;
BY  Enrolid Index_Therapy_;
RUN;
 
DATA WANT;
SET HAVE2;
BY  Enrolid Index_Therapy_;
RETAIN Index_Therapy;
IF FIRST.Enrolid THEN Index_Therapy=Index_Therapy_;
 ELSE  Index_Therapy=CATX(',',Index_Therapy,Index_Therapy_);
 IF LAST.Enrolid ;
 DROP Index_Therapy_;
RUN;
 

 

 

Thanks..

Ksharp
Super User
data have;
input Enrolid Index_Therapy $40.;
datalines;
114101 Sorafenib,Sorafenib,Sorafenib,Erlotinib
2086103 Sorafenib,Sorafenib,Sorafenib
102950901 Sorafenib,Sorafenib
117102901 Sorafenib,Radio,Radio
;
run;

data want;
 set have;
 array x{999} $ 80 _temporary_;
 length want $ 200;
 call missing(of x{*});
 n=0;
 do i=1 to countw(Index_Therapy,',');
  temp=scan(Index_Therapy,i,',');
  if temp not in x then do;n+1;x{n}=temp;end;
 end;
 want=catx(',',of x{*});
 drop n i temp;
run;
 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1299 views
  • 0 likes
  • 4 in conversation