topic End of file & hash / finding duplicate records in SAS Programming

End of file & hash / finding duplicate records

user24feb — Mon, 24 Nov 2014 16:21:11 GMT

Hello!

I would like to filter the duplicate records of data set A, this is I would like to kick out observations with Nr >=2 - If (EoF) & (Nr ge 2) Then H.Output(Dataset:"Duplicates"); doesn't work.

The following program works but keeps single observations:

Data A;
Do i=1 To 30;
ID=Byte(Int(RanUni(1)*26)+65);
Output;
End;
Run;

Data _NULL_;
Length Nr 3.;
If _N_ eq 1 Then Do;
    Declare Hash H();
H.DefineKey("ID");
H.DefineData("ID", "i", "Nr");
H.DefineDone();
End;
Set A End=EoF;
If H.Find() ne 0 Then Do;
    Nr=1;
H.Add();
End;
Else Do;
    Nr+1;
H.Replace();
End;
If EoF Then H.Output(Dataset:"Duplicates");
Run;

My 2nd question is, how can I find duplicate records (not count them) of dataset "A" using a hash object?

Thanks&kind regards

Re: End of file & hash / finding duplicate records

Ksharp — Tue, 25 Nov 2014 13:34:37 GMT

If I understood what you mean.



 
Data A;
  Do i=1 To 30;
    ID=Byte(Int(RanUni(1)*26)+65);
    Output;
  End;
Run;
data _null_;
 if _n_ eq 1 then do;
  if 0 then set a;
  declare hash h();
  h.definekey('id');
  h.definedata('id','n');
  h.definedone();
end;
set a end=last;
if h.find()=0 then do;n+1;h.replace();end;
 else do;n=1;h.replace();end;
if last then do;
 h.output(dataset:'singual(where=(n=1))');
 h.output(dataset:'duplicate(where=(n gt 1))');
end;
run;

Xia Keshan

Re: End of file & hash / finding duplicate records

user24feb — Tue, 25 Nov 2014 13:52:26 GMT

Yes, that's exactly what a meant. I didn't think to put a where-statement after the dataset. Many thanks!