remove duplicate numbers in a string of numbers

Reply
New Contributor
Posts: 4

remove duplicate numbers in a string of numbers

I need to remove duplicate numbers in a string of numbers. For example, the string may be "16 18 33 08 16 08". For this particular observation, I need the string to be "16 18 33 08"; thus, the duplicate numbers are removed.  Any ideas greatly appreciated.

Frequent Contributor
Posts: 87

Re: remove duplicate numbers in a string of numbers

Pretty crude but try:

data have ;

text = '16 18 33 08 16 08' ;

run ;

data have (drop = i text) ;

length breakup $3 ;

set have ;

do i = 1 to 10 ;

breakup = scan(text,i,' ') ;

output ;

end ;   

run ;

proc sort data = have (where = (breakup ne '')) nodupkey ;

by breakup ;

run ;

proc transpose data = have out = working (drop = _NAME_) prefix = breaknum ;

var breakup ;

run ;

data working (drop = breaknum: i ) ;

set working ;

array sepnum{10} $3 ;

array breaknum{10} $3 ;

do i = 1 to 10 ;

if breaknum{i} ne '' then sepnum{i} = breaknum{i} ;

else sepnum{i} = '' ;

end ;

run ;

data want (keep = new_text) ;

set working ;

new_text = catx(' ',sepnum1,sepnum2,sepnum3,sepnum4,sepnum5,sepnum6,sepnum7,sepnum8,sepnum9,sepnum10) ;

run ;

Respected Advisor
Posts: 4,641

Re: remove duplicate numbers in a string of numbers

It can be done in a single step :

data have;
str = "16 18 33 08 16 08";
run;

data want(drop=_Smiley Happy;
length _wp $8 _scr $200;
set have;
do _i = 1 to countw(str);
call scan(str,_i,_p,_l);
_wp = scan(str,_i);
if findw(str, trim(_wp), _p+_l) = 0 then _scr = catx(" ", _scr, _wp);
end;
str = _scr;
run;

proc print; run;

PG

PG
Frequent Contributor
Posts: 87

Re: remove duplicate numbers in a string of numbers

Heh heh much better than my hack!

Respected Advisor
Posts: 3,777

Re: remove duplicate numbers in a string of numbers

So this just treats the numbers like words and will work for any list of words.  If you want to dedup the numbers as numbers i.e 08=8=8.0=0008 then ask.

3344  data _null_;

3345     str = "16 18 33 08 16 08";

3346     length dedup w $32;

3347     do i = 1 by 1;

3348        w = scan(str,i);

3349        if missing(w) then leave;

3350        if indexW(dedup,w) eq 0 then dedup = catx(' ',dedup,w);

3351        end;

3352     put (str dedup)(/=);

3353     run;

str=16 18 33 08 16 08

dedup=16 18 33 08

Respected Advisor
Posts: 4,641

Re: remove duplicate numbers in a string of numbers

data_null_'s code is in my opnion the best solution provided. Kudos. - PG

PG
Super User
Posts: 9,671

Re: remove duplicate numbers in a string of numbers

data have;
str = "16 18 33 08 16 08"; output;
str = "33 08 16 08"; output;
run;
proc sql noprint;
 select max(countw(str)) into : n from have;
quit;
data want;
 set have;
 length want $ 200;
 array _a{&n} $ _temporary_ ;
 do i=1 to &n;
  if scan(str,i) not in _a then _a{i}=scan(str,i);
 end;
 want=catx(' ',of _a{*});
 call missing(of _a{*});
drop i;
run;


Ksharp

Frequent Contributor
Posts: 95

Re: remove duplicate numbers in a string of numbers

I think the following will do the trick using regular expressions.

Zafer

data x;

  length z $2000;

  x = "16 18 33 08 16 08";

  w = SCAN(x,1);

  z = '';

  do while(w ne '');

    prxstr = CATS('s/',w,'//');

    x = COMPBL(PRXCHANGE(prxstr,-1,SUBSTR(x,INDEX(x,' ')+1)));

    z = CATX(' ',z,w);

    w = SCAN(x,1);

  end;

  put z=;

  keep z;

run;

New Contributor
Posts: 4

Re: remove duplicate numbers in a string of numbers

Thanks everyone for your excellent ideas.  I  am beyond appreciative!  Alan

Respected Advisor
Posts: 3,124

Re: remove duplicate numbers in a string of numbers

FWIW, here is an hash() approach:

data have;

  x = "16 18 33 08 16 08";

output;

x="23 56 23 56";

output;

run;

data want (drop=_Smiley Happy;

length nodup $20.

  _new $2.;

  dcl hash h();

h.definekey('_new');

h.definedata('_new');

h.definedone();

dcl hiter hi('h');

set have ;

  do _i=1 by 1 until (missing(_new));

  _new=scan(x,_i);

_rc=h.ref();

  end;

do _rc=hi.first() by 0 while (_rc=0);

nodup=catx(' ',nodup,_new);

_rc=hi.next();

end;

run;

proc print;run;

Haikuo

Super Contributor
Posts: 1,636

Re: remove duplicate numbers in a string of numbers

for the purpose of practicing hash:

data have;

  x = "16 18 33 08 16 08";

output;

x="23 56 23 56";

output;

run;

data temp;length new $2;

set have;

do i=1 to countw(x);

   n=_n_;

   new=scan(x,i);

   output;

   end;

   run;

data _null_;

length nodup $20. new $2.;

if _n_=1 then do;

  dcl hash h();

  h.definekey('n');

  h.definedata('nodup');

  h.definedone();

    end;

set temp end=last;

if h.find() ne 0 then do;

    nodup=new;

    h.add();

  end;

  else do;

    if index(nodup,new)=0 then

  nodup=trim(nodup)||' '||new;

   h.replace();

   end;

   if last then  h.output(dataset:'want');

run;

proc print data=want;run;

Super User
Posts: 5,074

Re: remove duplicate numbers in a string of numbers

The more the merrier?  This approach is specialized to work with two-digit numbers only, from 00 to 99.  It also rearranges them in sequential order.

data want;

   set have;

   length big_string $ 302 next_word $ 2;

   do _n_=1 to countw(str);

        next_word = scan(str, _n_);

        substr(big_string, 3*input(next_word,2.)+1, 2) = next_word;

   end;

   str = left(compbl(big_string));

run;

Respected Advisor
Posts: 4,641

Re: remove duplicate numbers in a string of numbers

The possibilities are endless! Two more :

 

data want(drop=cSmiley Happy;

array c{0:99} $2;

set have;

do _n_ = 1 to countw(str);

c{input(scan(str,_n_),2.)} = scan(str,_n_);

end;

str = catx(" ", of c{*});

run;

or

data want(drop=lstr);

length lstr $299;

set have;

do _n_ = 0 to 99;

if indexw(str,put(_n_,z2.)) then lstr=catx(" ", lstr, put(_n_,z2.));

end;

str = lstr;

run;

PG

PG
Valued Guide
Posts: 765

Re: remove duplicate numbers in a string of numbers

hi ... another idea (why not ... same approach as data _null_, different code) ...

data have;

input str $20.;

datalines;

16 18 33 08 16 08

33 08 16 08 .  .

11 11 11 11 11 11

99 98 97 97 98 99

1 2 3 4 5 . .

1 2 2 4 5 . .

1 2 2 4 5 5 

;

data want;

length new $50;

set have;

do _n_=1 by 1 while (^missing(scan(str,_n_)));

   new = ifc(findw(new,scan(str,_n_)), new , catx(' ',new,scan(str,_n_)));

end;

run;

new            str

16 18 33 08    16 18 33 08 16 08

33 08 16       33 08 16 08

11             11 11 11 11 11 11

99 98 97       99 98 97 97 98 99

1 2 3 4 5      1 2 3 4 5

1 2 4 5        1 2 2 4 5

1 2 4 5        1 2 2 4 5 5

if numeric (separate variables) ...

data have;

input x1-x6;

datalines;

16 18 33 08 16 08

33 08 16 08 .  .

11 11 11 11 11 11

99 98 97 97 98 99

1 2 3 4 5 . .

1 2 2 4 5 . .

1 2 2 4 5 5 

;

data want (drop=j k);

set have;

array x(6);

array y(6);

do j=1 to 6;

   k = sum(k,1);

   y(k) = ifn(whichn(x(j), of y(*)) , . , x(j));

   k = ifn(y(k), k , k-1);

end;

run;

x1    x2    x3    x4    x5    x6    y1    y2    y3    y4    y5    y6

16    18    33     8    16     8    16    18    33     8     .     .

33     8    16     8     .     .    33     8    16     .     .     .

11    11    11    11    11    11    11     .     .     .     .     .

99    98    97    97    98    99    99    98    97     .     .     .

1     2     3     4     5     .     1     2     3     4     5     .

1     2     2     4     5     .     1     2     4     5     .     .

1     2     2     4     5     5     1     2     4     5     .     .

PROC Star
Posts: 7,356

Re: remove duplicate numbers in a string of numbers

Since this is still going on: same as Mike's but should run slightly faster:

data want2;

  length new $20;

  set have;

  do _n_=1 to countw(str);

   new = ifc(findw(new,scan(str,_n_)), new , catx(' ',new,scan(str,_n_)));

  end;

run;

Ask a Question
Discussion stats
  • 14 replies
  • 1884 views
  • 8 likes
  • 11 in conversation