## End of file & hash / finding duplicate records

Solved
Super Contributor
Posts: 355

# End of file & hash / finding duplicate records

Hello!

I would like to filter the duplicate records of data set A, this is I would like to kick out observations with Nr >=2 -  If (EoF) & (Nr ge 2) Then H.Output(Dataset:"Duplicates"); doesn't work.

The following program works but keeps single observations:

Data A;
Do i=1 To 30;
ID=Byte(Int(RanUni(1)*26)+65);
Output;
End;
Run;

Data _NULL_;
Length Nr 3.;
If _N_ eq 1 Then Do;
Declare Hash H();
H.DefineKey("ID");
H.DefineData("ID", "i", "Nr");
H.DefineDone();
End;
Set A End=EoF;
If H.Find() ne 0 Then Do;
Nr=1;
End;
Else Do;
Nr+1;
H.Replace();
End;
If EoF Then H.Output(Dataset:"Duplicates");
Run;

My 2nd question is, how can I find duplicate records (not count them) of dataset "A" using a hash object?

Thanks&kind regards

Accepted Solutions
Solution
‎11-25-2014 08:34 AM
Super User
Posts: 10,860

## Re: End of file & hash / finding duplicate records

If I understood what you mean.

```

Data A;
Do i=1 To 30;
ID=Byte(Int(RanUni(1)*26)+65);
Output;
End;
Run;
data _null_;
if _n_ eq 1 then do;
if 0 then set a;
declare hash h();
h.definekey('id');
h.definedata('id','n');
h.definedone();
end;
set a end=last;
if h.find()=0 then do;n+1;h.replace();end;
else do;n=1;h.replace();end;
if last then do;
h.output(dataset:'singual(where=(n=1))');
h.output(dataset:'duplicate(where=(n gt 1))');
end;
run;

```

Xia Keshan

All Replies
Solution
‎11-25-2014 08:34 AM
Super User
Posts: 10,860

## Re: End of file & hash / finding duplicate records

If I understood what you mean.

```

Data A;
Do i=1 To 30;
ID=Byte(Int(RanUni(1)*26)+65);
Output;
End;
Run;
data _null_;
if _n_ eq 1 then do;
if 0 then set a;
declare hash h();
h.definekey('id');
h.definedata('id','n');
h.definedone();
end;
set a end=last;
if h.find()=0 then do;n+1;h.replace();end;
else do;n=1;h.replace();end;
if last then do;
h.output(dataset:'singual(where=(n=1))');
h.output(dataset:'duplicate(where=(n gt 1))');
end;
run;

```

Xia Keshan

Super Contributor
Posts: 355

## Re: End of file & hash / finding duplicate records

Yes, that's exactly what a meant. I didn't think to put a where-statement after the dataset. Many thanks!

🔒 This topic is solved and locked.