Solved: remove duplicates with out sort - Page 2

Ksharp · Posted 06-19-2021 08:03 AM

🙂
Sure. I have a boss ,since I am a workman .

Tom · Posted 06-16-2021 10:45 AM

@BrahmanandaRao wrote:
data test;
input Empname $ ;
datalines;
ram
sita
ram
arjun
ram
sita
;
run;
Interview asked me a question without sorting how to remove duplicates

using above dataset scenario he said donot chage order of empnames but remove duplicates only datastep method

In general a HASH (or some other method of remembering what values you have seen before) will do this.

data want;
  if _n_=1 then do;
   declare hash h();
   h.definekey('empname');
   h.definedone();
  end;
  set test ;
  if h.find() then do;
    output;
    h.add();
  end;
run;

But if the data is too large then HASH will not work (HASH needs to be in memory) as would any other DATA step only method. In which case sorting is probably your best method. Either directly using PROC SORT or implicitly using PROC SQL code. Just add a new variable to record the original order so it can be recreated.

data temp;
  row+1;
  set test;
run;
proc sql ;
create table want as
  select empname
  from temp
  group by empname
  having row=min(row)
  order by row
;
quit;

Re: remove duplicates with out sort

Re: remove duplicates with out sort

Registration is open

Re: remove duplicates with out sort

Re: remove duplicates with out sort

Registration is open

SAS Training: Just a Click Away