BookmarkSubscribeRSS Feed
Guptashwe
Calcite | Level 5

I have a data set with two variables Enrolid and Index_Therapy. Under index_therapy i have data with duplicate values in the string for each record, I would need a code to remove duplicates from each string.

 

Sample data set is attached.

sample data.PNG

 

 

Output data set

Enrolid        Index_Therapy

114101       Sorafenib,Erlotinib

2086103     Sorafenib

102950901 Sorafenib

117102901 Sorafenib,Radio    etc

3 REPLIES 3
Kurt_Bremser
Super User

Create a new string, and check if a word is already present in it:

data want;
set problem_1 (rename=(index_therapy=_index_therapy));
length index_therapy $400;
do i = 1 to countw(_index_therapy,',');
  word = scan(_index_therapy,i,',');
  if not findw(index_therapy,strip(word),', ')
  then index_therapy = catx(',',index_therapy,word);
end;
keep enrolid index_therapy;
run;

 

 

Note that "Need Help" does not meet the requirements for a descriptive subject line. Be a little more creative next time.

singhsahab
Lapis Lazuli | Level 10

Hello,

 

Here is an alternate solution..

 

data have;
input Enrolid Index_Therapy $40.;
datalines;
114101 Sorafenib,Sorafenib,Sorafenib,Erlotinib
2086103 Sorafenib,Sorafenib,Sorafenib
102950901 Sorafenib,Sorafenib
117102901 Sorafenib,Radio,Radio
;
run;

DATA HAVE1;
SET HAVE;
DO I=1 TO COUNT(Index_Therapy,',')+1;
 Index_Therapy_=SCAN(Index_Therapy,I,',');
 OUTPUT;
END;
DROP I Index_Therapy;
RUN; 
 
PROC SORT DATA=HAVE1 OUT=HAVE2 NODUP;
BY  Enrolid Index_Therapy_;
RUN;
 
DATA WANT;
SET HAVE2;
BY  Enrolid Index_Therapy_;
RETAIN Index_Therapy;
IF FIRST.Enrolid THEN Index_Therapy=Index_Therapy_;
 ELSE  Index_Therapy=CATX(',',Index_Therapy,Index_Therapy_);
 IF LAST.Enrolid ;
 DROP Index_Therapy_;
RUN;
 

 

 

Thanks..

Ksharp
Super User
data have;
input Enrolid Index_Therapy $40.;
datalines;
114101 Sorafenib,Sorafenib,Sorafenib,Erlotinib
2086103 Sorafenib,Sorafenib,Sorafenib
102950901 Sorafenib,Sorafenib
117102901 Sorafenib,Radio,Radio
;
run;

data want;
 set have;
 array x{999} $ 80 _temporary_;
 length want $ 200;
 call missing(of x{*});
 n=0;
 do i=1 to countw(Index_Therapy,',');
  temp=scan(Index_Therapy,i,',');
  if temp not in x then do;n+1;x{n}=temp;end;
 end;
 want=catx(',',of x{*});
 drop n i temp;
run;
 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 953 views
  • 0 likes
  • 4 in conversation