BookmarkSubscribeRSS Feed
Guptashwe
Calcite | Level 5

I have a data set with two variables Enrolid and Index_Therapy. Under index_therapy i have data with duplicate values in the string for each record, I would need a code to remove duplicates from each string.

 

Sample data set is attached.

sample data.PNG

 

 

Output data set

Enrolid        Index_Therapy

114101       Sorafenib,Erlotinib

2086103     Sorafenib

102950901 Sorafenib

117102901 Sorafenib,Radio    etc

3 REPLIES 3
Kurt_Bremser
Super User

Create a new string, and check if a word is already present in it:

data want;
set problem_1 (rename=(index_therapy=_index_therapy));
length index_therapy $400;
do i = 1 to countw(_index_therapy,',');
  word = scan(_index_therapy,i,',');
  if not findw(index_therapy,strip(word),', ')
  then index_therapy = catx(',',index_therapy,word);
end;
keep enrolid index_therapy;
run;

 

 

Note that "Need Help" does not meet the requirements for a descriptive subject line. Be a little more creative next time.

singhsahab
Lapis Lazuli | Level 10

Hello,

 

Here is an alternate solution..

 

data have;
input Enrolid Index_Therapy $40.;
datalines;
114101 Sorafenib,Sorafenib,Sorafenib,Erlotinib
2086103 Sorafenib,Sorafenib,Sorafenib
102950901 Sorafenib,Sorafenib
117102901 Sorafenib,Radio,Radio
;
run;

DATA HAVE1;
SET HAVE;
DO I=1 TO COUNT(Index_Therapy,',')+1;
 Index_Therapy_=SCAN(Index_Therapy,I,',');
 OUTPUT;
END;
DROP I Index_Therapy;
RUN; 
 
PROC SORT DATA=HAVE1 OUT=HAVE2 NODUP;
BY  Enrolid Index_Therapy_;
RUN;
 
DATA WANT;
SET HAVE2;
BY  Enrolid Index_Therapy_;
RETAIN Index_Therapy;
IF FIRST.Enrolid THEN Index_Therapy=Index_Therapy_;
 ELSE  Index_Therapy=CATX(',',Index_Therapy,Index_Therapy_);
 IF LAST.Enrolid ;
 DROP Index_Therapy_;
RUN;
 

 

 

Thanks..

Ksharp
Super User
data have;
input Enrolid Index_Therapy $40.;
datalines;
114101 Sorafenib,Sorafenib,Sorafenib,Erlotinib
2086103 Sorafenib,Sorafenib,Sorafenib
102950901 Sorafenib,Sorafenib
117102901 Sorafenib,Radio,Radio
;
run;

data want;
 set have;
 array x{999} $ 80 _temporary_;
 length want $ 200;
 call missing(of x{*});
 n=0;
 do i=1 to countw(Index_Therapy,',');
  temp=scan(Index_Therapy,i,',');
  if temp not in x then do;n+1;x{n}=temp;end;
 end;
 want=catx(',',of x{*});
 drop n i temp;
run;
 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 799 views
  • 0 likes
  • 4 in conversation