Hello!
I would like to filter the duplicate records of data set A, this is I would like to kick out observations with Nr >=2 - If (EoF) & (Nr ge 2) Then H.Output(Dataset:"Duplicates"); doesn't work.
The following program works but keeps single observations:
Data A;
Do i=1 To 30;
ID=Byte(Int(RanUni(1)*26)+65);
Output;
End;
Run;
Data _NULL_;
Length Nr 3.;
If _N_ eq 1 Then Do;
Declare Hash H();
H.DefineKey("ID");
H.DefineData("ID", "i", "Nr");
H.DefineDone();
End;
Set A End=EoF;
If H.Find() ne 0 Then Do;
Nr=1;
H.Add();
End;
Else Do;
Nr+1;
H.Replace();
End;
If EoF Then H.Output(Dataset:"Duplicates");
Run;
My 2nd question is, how can I find duplicate records (not count them) of dataset "A" using a hash object?
Thanks&kind regards
If I understood what you mean.
Data A; Do i=1 To 30; ID=Byte(Int(RanUni(1)*26)+65); Output; End; Run; data _null_; if _n_ eq 1 then do; if 0 then set a; declare hash h(); h.definekey('id'); h.definedata('id','n'); h.definedone(); end; set a end=last; if h.find()=0 then do;n+1;h.replace();end; else do;n=1;h.replace();end; if last then do; h.output(dataset:'singual(where=(n=1))'); h.output(dataset:'duplicate(where=(n gt 1))'); end; run;
Xia Keshan
If I understood what you mean.
Data A; Do i=1 To 30; ID=Byte(Int(RanUni(1)*26)+65); Output; End; Run; data _null_; if _n_ eq 1 then do; if 0 then set a; declare hash h(); h.definekey('id'); h.definedata('id','n'); h.definedone(); end; set a end=last; if h.find()=0 then do;n+1;h.replace();end; else do;n=1;h.replace();end; if last then do; h.output(dataset:'singual(where=(n=1))'); h.output(dataset:'duplicate(where=(n gt 1))'); end; run;
Xia Keshan
Yes, that's exactly what a meant. I didn't think to put a where-statement after the dataset. Many thanks!
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.