DATA Step, Macro, Functions and more

Storing Duplicate observations into one dataset

Reply
N/A
Posts: 0

Storing Duplicate observations into one dataset

Hi,
My input dataset is X and i need two output datasets Y and Z as below:
Y contains complete duplicate observations and Z contians only unique observations.

data X;
input id age sex$;
cards;
1 22 M
1 22 M
1 23 M
1 24 M
1 24 M
2 12 F
2 13 F
2 14 F
2 14 F
3 24 M
3 24 M
3 24 M
4 25 F
5 26 M
6 24 M
7 26 M
run;

Data Y:
1 22 M
1 22 M
1 24 M
1 24 M
2 14 F
2 14 F
3 24 M
3 24 M
3 24 M

Data Z:
1 23 M
2 12 F
2 13 F
4 25 F
5 26 M
6 24 M
7 26 M

Thanks & Regards
Sam
Super Contributor
Super Contributor
Posts: 3,174

Re: Storing Duplicate observations into one dataset

Consider the DUPOUT= keyword with PROC SORT.

Scott Barry
SBBWorks, Inc.
Super Contributor
Super Contributor
Posts: 3,174

Re: Storing Duplicate observations into one dataset

Sorry - DUPOUT= is not the technique, given your desired output conditions. In fact, there is a near-identical post over in the SAS PROCEDURES forum, with the SUBJECT "Reg Smiley Very Happyuplicates" for your reference.

Scott Barry
SBBWorks, Inc.
SAS Employee
Posts: 160

Re: Storing Duplicate observations into one dataset

[pre]data X;
input id age sex$;
cards;
1 22 M
1 22 M
1 23 M
1 24 M
1 24 M
2 12 F
2 13 F
2 14 F
2 14 F
3 24 M
3 24 M
3 24 M
4 25 F
5 26 M
6 24 M
7 26 M
;
run;

proc sort data=X; by id age sex; run;

proc means data=X n;
by id age sex;
output out=Z(where=(_freq_=1));
run;

data Y;
merge X
Z(in=in_z keep=id age sex);
by id age sex;
if not in_z then output;
run;[/pre]
N/A
Posts: 0

Re: Storing Duplicate observations into one dataset

Thank you very much for your reply
Respected Advisor
Posts: 3,777

Re: Storing Duplicate observations into one dataset

In a data step, with an obvious limitation on variable names.

[pre]
data dups unique;
set;
by _all_;
array f
  • first:;
    array l
  • last:;
    if f[dim(f)] and l[dim(l)] then do;
    output unique;
    return;
    end;
    output dups;
    run;
    proc print data=dups;
    proc print data=unique;
    run;
    [/pre]
  • Super Contributor
    Posts: 474

    Re: Storing Duplicate observations into one dataset

    Another simple way of doing this would be:

    (And assuming that X is already sorted by ID AGE SEX)

    data Y Z;
    set X;
    by ID AGE SEX; /* assume that X is sorted this way */

    /* if the value of the last varible in group (SEX) is first and last of the group, then unique */
    if (first.SEX and last.SEX) then output Z;
    else output Y; /* else, duplicate */
    run;

    Greetings from Portugal.

    Daniel Santos at www.cgd.pt
    Ask a Question
    Discussion stats
    • 6 replies
    • 185 views
    • 0 likes
    • 5 in conversation