BookmarkSubscribeRSS Feed
Guptashwe
Calcite | Level 5

I have a data set with two variables Enrolid and Index_Therapy. Under index_therapy i have data with duplicate values in the string for each record, I would need a code to remove duplicates from each string.

 

Sample data set is attached.

sample data.PNG

 

 

Output data set

Enrolid        Index_Therapy

114101       Sorafenib,Erlotinib

2086103     Sorafenib

102950901 Sorafenib

117102901 Sorafenib,Radio    etc

3 REPLIES 3
Kurt_Bremser
Super User

Create a new string, and check if a word is already present in it:

data want;
set problem_1 (rename=(index_therapy=_index_therapy));
length index_therapy $400;
do i = 1 to countw(_index_therapy,',');
  word = scan(_index_therapy,i,',');
  if not findw(index_therapy,strip(word),', ')
  then index_therapy = catx(',',index_therapy,word);
end;
keep enrolid index_therapy;
run;

 

 

Note that "Need Help" does not meet the requirements for a descriptive subject line. Be a little more creative next time.

singhsahab
Lapis Lazuli | Level 10

Hello,

 

Here is an alternate solution..

 

data have;
input Enrolid Index_Therapy $40.;
datalines;
114101 Sorafenib,Sorafenib,Sorafenib,Erlotinib
2086103 Sorafenib,Sorafenib,Sorafenib
102950901 Sorafenib,Sorafenib
117102901 Sorafenib,Radio,Radio
;
run;

DATA HAVE1;
SET HAVE;
DO I=1 TO COUNT(Index_Therapy,',')+1;
 Index_Therapy_=SCAN(Index_Therapy,I,',');
 OUTPUT;
END;
DROP I Index_Therapy;
RUN; 
 
PROC SORT DATA=HAVE1 OUT=HAVE2 NODUP;
BY  Enrolid Index_Therapy_;
RUN;
 
DATA WANT;
SET HAVE2;
BY  Enrolid Index_Therapy_;
RETAIN Index_Therapy;
IF FIRST.Enrolid THEN Index_Therapy=Index_Therapy_;
 ELSE  Index_Therapy=CATX(',',Index_Therapy,Index_Therapy_);
 IF LAST.Enrolid ;
 DROP Index_Therapy_;
RUN;
 

 

 

Thanks..

Ksharp
Super User
data have;
input Enrolid Index_Therapy $40.;
datalines;
114101 Sorafenib,Sorafenib,Sorafenib,Erlotinib
2086103 Sorafenib,Sorafenib,Sorafenib
102950901 Sorafenib,Sorafenib
117102901 Sorafenib,Radio,Radio
;
run;

data want;
 set have;
 array x{999} $ 80 _temporary_;
 length want $ 200;
 call missing(of x{*});
 n=0;
 do i=1 to countw(Index_Therapy,',');
  temp=scan(Index_Therapy,i,',');
  if temp not in x then do;n+1;x{n}=temp;end;
 end;
 want=catx(',',of x{*});
 drop n i temp;
run;
 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 930 views
  • 0 likes
  • 4 in conversation