Re: Storing Duplicate observations into one dataset

deleted_user · Posted 03-31-2009 02:57 AM

Hi,
My input dataset is X and i need two output datasets Y and Z as below:
Y contains complete duplicate observations and Z contians only unique observations.

data X;
input id age sex$;
cards;
1 22 M
1 22 M
1 23 M
1 24 M
1 24 M
2 12 F
2 13 F
2 14 F
2 14 F
3 24 M
3 24 M
3 24 M
4 25 F
5 26 M
6 24 M
7 26 M
run;

Data Y:
1 22 M
1 22 M
1 24 M
1 24 M
2 14 F
2 14 F
3 24 M
3 24 M
3 24 M

Data Z:
1 23 M
2 12 F
2 13 F
4 25 F
5 26 M
6 24 M
7 26 M

Thanks & Regards
Sam

sbb · Posted 03-31-2009 03:48 AM

Consider the DUPOUT= keyword with PROC SORT.

Scott Barry
SBBWorks, Inc.

sbb · Posted 03-31-2009 04:08 AM

Sorry - DUPOUT= is not the technique, given your desired output conditions. In fact, there is a near-identical post over in the SAS PROCEDURES forum, with the SUBJECT "Reg :Duplicates" for your reference.

Scott Barry
SBBWorks, Inc.

GertNissen · Posted 03-31-2009 04:33 AM

[pre]data X;
input id age sex$;
cards;
1 22 M
1 22 M
1 23 M
1 24 M
1 24 M
2 12 F
2 13 F
2 14 F
2 14 F
3 24 M
3 24 M
3 24 M
4 25 F
5 26 M
6 24 M
7 26 M
;
run;

proc sort data=X; by id age sex; run;

proc means data=X n;
by id age sex;
output out=Z(where=(_freq_=1));
run;

data Y;
merge X
Z(in=in_z keep=id age sex);
by id age sex;
if not in_z then output;
run;[/pre]

deleted_user · Posted 04-01-2009 03:16 AM

Thank you very much for your reply

data_null__ · Posted 03-31-2009 12:31 PM

In a data step, with an obvious limitation on variable names.

[pre]
data dups unique;
set;
by _all_;
array f

first:;
array l

last:;
if f[dim(f)] and l[dim(l)] then do;
output unique;
return;
end;
output dups;
run;
proc print data=dups;
proc print data=unique;
run;
[/pre]

DanielSantos · Posted 04-01-2009 04:47 AM

Another simple way of doing this would be:

(And assuming that X is already sorted by ID AGE SEX)

data Y Z;
set X;
by ID AGE SEX; /* assume that X is sorted this way */

/* if the value of the last varible in group (SEX) is first and last of the group, then unique */
if (first.SEX and last.SEX) then output Z;
else output Y; /* else, duplicate */
run;

Greetings from Portugal.

Daniel Santos at www.cgd.pt

Storing Duplicate observations into one dataset