End of file & hash / finding duplicate records

Accepted Solution Solved
Reply
Super Contributor
Posts: 336
Accepted Solution

End of file & hash / finding duplicate records


Hello!

I would like to filter the duplicate records of data set A, this is I would like to kick out observations with Nr >=2 -  If (EoF) & (Nr ge 2) Then H.Output(Dataset:"Duplicates"); doesn't work.

The following program works but keeps single observations:

Data A;
  Do i=1 To 30;
    ID=Byte(Int(RanUni(1)*26)+65);
    Output;
  End;
Run;

Data _NULL_;
  Length Nr 3.;
  If _N_ eq 1 Then Do;
    Declare Hash H();
H.DefineKey("ID");
H.DefineData("ID", "i", "Nr");
H.DefineDone();
  End;
  Set A End=EoF;
  If H.Find() ne 0 Then Do;
    Nr=1;
H.Add();
  End;
  Else Do;
    Nr+1;
H.Replace();
  End;
  If EoF Then H.Output(Dataset:"Duplicates");
Run;

My 2nd question is, how can I find duplicate records (not count them) of dataset "A" using a hash object?

Thanks&kind regards


Accepted Solutions
Solution
‎11-25-2014 08:34 AM
Super User
Posts: 9,682

Re: End of file & hash / finding duplicate records

If I understood what you mean.



 
Data A;
  Do i=1 To 30;
    ID=Byte(Int(RanUni(1)*26)+65);
    Output;
  End;
Run;
data _null_;
 if _n_ eq 1 then do;
  if 0 then set a;
  declare hash h();
  h.definekey('id');
  h.definedata('id','n');
  h.definedone();
end;
set a end=last;
if h.find()=0 then do;n+1;h.replace();end;
 else do;n=1;h.replace();end;
if last then do;
 h.output(dataset:'singual(where=(n=1))');
 h.output(dataset:'duplicate(where=(n gt 1))');
end;
run;

Xia Keshan

View solution in original post


All Replies
Solution
‎11-25-2014 08:34 AM
Super User
Posts: 9,682

Re: End of file & hash / finding duplicate records

If I understood what you mean.



 
Data A;
  Do i=1 To 30;
    ID=Byte(Int(RanUni(1)*26)+65);
    Output;
  End;
Run;
data _null_;
 if _n_ eq 1 then do;
  if 0 then set a;
  declare hash h();
  h.definekey('id');
  h.definedata('id','n');
  h.definedone();
end;
set a end=last;
if h.find()=0 then do;n+1;h.replace();end;
 else do;n=1;h.replace();end;
if last then do;
 h.output(dataset:'singual(where=(n=1))');
 h.output(dataset:'duplicate(where=(n gt 1))');
end;
run;

Xia Keshan

Super Contributor
Posts: 336

Re: End of file & hash / finding duplicate records

Yes, that's exactly what a meant. I didn't think to put a where-statement after the dataset. Many thanks!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 255 views
  • 0 likes
  • 2 in conversation