10-28-2014 11:43 AM
Hello!
I would like to remove the duplicate parts of a character variable:
Data Have;
Input N $63.;
Datalines;
208_01_460_03_461_02_469_01_46x_02_461_02
208_01_460_03_461_02_469_01_46x_03_460_03_461_02_469_01_461_02
208_01_460_03_461_02_469_02_46x_01_461_02
208_01_460_03_461_02_469_02_46x_02_461_02
208_01_460_03_461_02_469_02_46x_03_460_03_461_02_469_02_461_02
;
Run;
The desired output is (sorting is not really required, however it would be nice if the 1st entry stays and every following identical entry is removed):
208_01_460_03_461_02_469_01_46x_02 -> remove 1x 461_01
208_01_460_03_461_02_469_01_46x_03 -> remove 460_03, 461_02 (2x!), 469_01
208_01_460_03_461_02_469_02_46x_01 -> remove 461_02 , etc.
Could somebody please help?
10-29-2014 08:11 AM
Token is 460_03 not 460 or 03 ?
Data Have; Input N $63.; Datalines; 208_01_460_03_461_02_469_01_46x_02_461_02 208_01_460_03_461_02_469_01_46x_03_460_03_461_02_469_01_461_02 208_01_460_03_461_02_469_02_46x_01_461_02 208_01_460_03_461_02_469_02_46x_02_461_02 208_01_460_03_461_02_469_02_46x_03_460_03_461_02_469_02_461_02 ; Run; data want; set have; length new token $ 100; do i=1 to countw(n,'_') by 2; token=catx('_',scan(n,i,'_'),scan(n,i+1,'_')); put token=; if not find(new,token,'t') then new=catx('_',new,token); end; run;
Xia Keshan
10-28-2014 12:08 PM
Perhaps something along the lines of:
data want;
set have;
I=2; /* as 1 can only have itself */
do until (scan(n,'_',i)="");
del=0;
do j=I to 1 by -1;
if scan(n,'_',i)=scan(n,'_',j) then del=1;
end;
if del=0 then new_val=strip(new_val)||"_"||scan(n,'_',i);
I=I+1;
end;
run;
**Note not tested as leaving now.
10-29-2014 08:11 AM
Token is 460_03 not 460 or 03 ?
Data Have; Input N $63.; Datalines; 208_01_460_03_461_02_469_01_46x_02_461_02 208_01_460_03_461_02_469_01_46x_03_460_03_461_02_469_01_461_02 208_01_460_03_461_02_469_02_46x_01_461_02 208_01_460_03_461_02_469_02_46x_02_461_02 208_01_460_03_461_02_469_02_46x_03_460_03_461_02_469_02_461_02 ; Run; data want; set have; length new token $ 100; do i=1 to countw(n,'_') by 2; token=catx('_',scan(n,i,'_'),scan(n,i+1,'_')); put token=; if not find(new,token,'t') then new=catx('_',new,token); end; run;
Xia Keshan
10-29-2014 08:19 AM
Many Thanks (yes the token is 460_03)! Works perfectly!
Need further help from the community? Please ask a new question.