I have a data set with two variables Enrolid and Index_Therapy. Under index_therapy i have data with duplicate values in the string for each record, I would need a code to remove duplicates from each string.
Sample data set is attached.
Output data set
Enrolid Index_Therapy
114101 Sorafenib,Erlotinib
2086103 Sorafenib
102950901 Sorafenib
117102901 Sorafenib,Radio etc
Create a new string, and check if a word is already present in it:
data want;
set problem_1 (rename=(index_therapy=_index_therapy));
length index_therapy $400;
do i = 1 to countw(_index_therapy,',');
word = scan(_index_therapy,i,',');
if not findw(index_therapy,strip(word),', ')
then index_therapy = catx(',',index_therapy,word);
end;
keep enrolid index_therapy;
run;
Note that "Need Help" does not meet the requirements for a descriptive subject line. Be a little more creative next time.
Hello,
Here is an alternate solution..
data have;
input Enrolid Index_Therapy $40.;
datalines;
114101 Sorafenib,Sorafenib,Sorafenib,Erlotinib
2086103 Sorafenib,Sorafenib,Sorafenib
102950901 Sorafenib,Sorafenib
117102901 Sorafenib,Radio,Radio
;
run;
DATA HAVE1;
SET HAVE;
DO I=1 TO COUNT(Index_Therapy,',')+1;
Index_Therapy_=SCAN(Index_Therapy,I,',');
OUTPUT;
END;
DROP I Index_Therapy;
RUN;
PROC SORT DATA=HAVE1 OUT=HAVE2 NODUP;
BY Enrolid Index_Therapy_;
RUN;
DATA WANT;
SET HAVE2;
BY Enrolid Index_Therapy_;
RETAIN Index_Therapy;
IF FIRST.Enrolid THEN Index_Therapy=Index_Therapy_;
ELSE Index_Therapy=CATX(',',Index_Therapy,Index_Therapy_);
IF LAST.Enrolid ;
DROP Index_Therapy_;
RUN;
Thanks..
data have;
input Enrolid Index_Therapy $40.;
datalines;
114101 Sorafenib,Sorafenib,Sorafenib,Erlotinib
2086103 Sorafenib,Sorafenib,Sorafenib
102950901 Sorafenib,Sorafenib
117102901 Sorafenib,Radio,Radio
;
run;
data want;
set have;
array x{999} $ 80 _temporary_;
length want $ 200;
call missing(of x{*});
n=0;
do i=1 to countw(Index_Therapy,',');
temp=scan(Index_Therapy,i,',');
if temp not in x then do;n+1;x{n}=temp;end;
end;
want=catx(',',of x{*});
drop n i temp;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.