## question about selectively removing duplicates

Solved
Occasional Contributor
Posts: 19

# question about selectively removing duplicates

[ Edited ]

I have a question about how to remove duplicate rows based on the value of a certain variable. I'm using the example below to explain what I am effectively trying to do.

In this dataset, each id has many duplicate values for the variable med. In some of those rows, there is also a value for the variable flag. In those cases, I want to preserve only the rows where there is a value for flag. However, there are other rows for the id and med value where there is NO value for the variable flag (eg, where id = 1 and Med = B); in those cases, I want to just keep all of those rows intact.

id  med  flag   seq

1    A       1       1

1    A       .        2

1    A       .        3

1    B       .        1

1    B       .        2

2    A       1       1

2    A       .        2

2    C       .        1

2    C       0       2

2    C       .        3

2    C       .        4

2    D       1       1

This is the dataset that I want to have as the end result:

id  med  flag   seq

1    A       1       1

1    B       .        1

1    B       .        2

2    A       1       1

2    C       0       2

2    D       1       1

Accepted Solutions
Solution
2 weeks ago
Occasional Contributor
Posts: 6

## Re: question about selectively removing duplicates

[ Edited ]
``````data have;
input id med\$ flag seq;
cards;
1 A 1 1
1 A . 2
1 A . 3
1 B . 1
1 B . 2
2 A 1 1
2 A . 2
2 C . 1
2 C 0 2
2 C . 3
2 C . 4
2 D 1 1
;

proc sort data=have out=flag (keep=id med) nodupkey;
by id med;
where flag^=.;
run;

data want1;
merge have flag(in=_in);
by id med;
if (_in=1 and flag^=.) or _in=0 then output;
run;

or

data want2;
set have;

if _n_=1 then do;
dcl hash hs( dataset:"have(where=(flag^=.))" );
hs.definekey("id","med");
hs.definedone();
end;

rc = hs.check();
if rc =0 and flag^=. then output;
if rc^=0 then output;
run;

``````

All Replies
Solution
2 weeks ago
Occasional Contributor
Posts: 6

## Re: question about selectively removing duplicates

[ Edited ]
``````data have;
input id med\$ flag seq;
cards;
1 A 1 1
1 A . 2
1 A . 3
1 B . 1
1 B . 2
2 A 1 1
2 A . 2
2 C . 1
2 C 0 2
2 C . 3
2 C . 4
2 D 1 1
;

proc sort data=have out=flag (keep=id med) nodupkey;
by id med;
where flag^=.;
run;

data want1;
merge have flag(in=_in);
by id med;
if (_in=1 and flag^=.) or _in=0 then output;
run;

or

data want2;
set have;

if _n_=1 then do;
dcl hash hs( dataset:"have(where=(flag^=.))" );
hs.definekey("id","med");
hs.definedone();
end;

rc = hs.check();
if rc =0 and flag^=. then output;
if rc^=0 then output;
run;

``````
Occasional Contributor
Posts: 19

## Re: question about selectively removing duplicates

Thank you so much, amatsu! This worked!

Super User
Posts: 10,681

## Re: question about selectively removing duplicates

``````data have;
input id med\$ flag seq;
cards;
1 A 1 1
1 A . 2
1 A . 3
1 B . 1
1 B . 2
2 A 1 1
2 A . 2
2 C . 1
2 C 0 2
2 C . 3
2 C . 4
2 D 1 1
;
data want;
do until(last.med);
set have;
by id med notsorted;
if not missing(flag) then yes=1;
end;
do until(last.med);
set have;
by id med notsorted;
if yes then do;if not missing(flag) then output;end;
else output;
end;
drop yes;
run;``````
PROC Star
Posts: 1,558

## Re: question about selectively removing duplicates

``````data have;
input id med\$ flag seq;
cards;
1 A 1 1
1 A . 2
1 A . 3
1 B . 1
1 B . 2
2 A 1 1
2 A . 2
2 C . 1
2 C 0 2
2 C . 3
2 C . 4
2 D 1 1
;

proc sql;
create table want as
select *
from have
group by id,med
having flag=max(flag)
order by id,med,seq;
quit;``````
☑ This topic is solved.