I have a data set with two variables Enrolid and Index_Therapy. Under index_therapy i have data with duplicate values in the string for each record, I would need a code to remove duplicates from each string.
Sample data set is attached.
Output data set
Enrolid Index_Therapy
114101 Sorafenib,Erlotinib
2086103 Sorafenib
102950901 Sorafenib
117102901 Sorafenib,Radio etc
Create a new string, and check if a word is already present in it:
data want;
set problem_1 (rename=(index_therapy=_index_therapy));
length index_therapy $400;
do i = 1 to countw(_index_therapy,',');
word = scan(_index_therapy,i,',');
if not findw(index_therapy,strip(word),', ')
then index_therapy = catx(',',index_therapy,word);
end;
keep enrolid index_therapy;
run;
Note that "Need Help" does not meet the requirements for a descriptive subject line. Be a little more creative next time.
Hello,
Here is an alternate solution..
data have;
input Enrolid Index_Therapy $40.;
datalines;
114101 Sorafenib,Sorafenib,Sorafenib,Erlotinib
2086103 Sorafenib,Sorafenib,Sorafenib
102950901 Sorafenib,Sorafenib
117102901 Sorafenib,Radio,Radio
;
run;
DATA HAVE1;
SET HAVE;
DO I=1 TO COUNT(Index_Therapy,',')+1;
Index_Therapy_=SCAN(Index_Therapy,I,',');
OUTPUT;
END;
DROP I Index_Therapy;
RUN;
PROC SORT DATA=HAVE1 OUT=HAVE2 NODUP;
BY Enrolid Index_Therapy_;
RUN;
DATA WANT;
SET HAVE2;
BY Enrolid Index_Therapy_;
RETAIN Index_Therapy;
IF FIRST.Enrolid THEN Index_Therapy=Index_Therapy_;
ELSE Index_Therapy=CATX(',',Index_Therapy,Index_Therapy_);
IF LAST.Enrolid ;
DROP Index_Therapy_;
RUN;
Thanks..
data have;
input Enrolid Index_Therapy $40.;
datalines;
114101 Sorafenib,Sorafenib,Sorafenib,Erlotinib
2086103 Sorafenib,Sorafenib,Sorafenib
102950901 Sorafenib,Sorafenib
117102901 Sorafenib,Radio,Radio
;
run;
data want;
set have;
array x{999} $ 80 _temporary_;
length want $ 200;
call missing(of x{*});
n=0;
do i=1 to countw(Index_Therapy,',');
temp=scan(Index_Therapy,i,',');
if temp not in x then do;n+1;x{n}=temp;end;
end;
want=catx(',',of x{*});
drop n i temp;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.