Keeping one of two rows with duplicate ID

Walternate · Posted 06-09-2015 10:41 AM

Hi,

I have a dataset of duplicates--each ID occurs twice (but the values of other variables can differ within the same ID), like this:

ID Var1 Var2

1 1 0

1 0 1

2 0 0

2 0 1

3 0 0

3 1 1

I want to keep one row for each ID, but I need to do it conditionally. First, if they have a value of 1 for Var1, that is the row that should be kept (each pair of rows will either have one 0 and one 1 for Var1 or two 0s). If a pair of rows has two 0s for Var1, the one with a value of 1 for Var2 should be kept.

Any help is much appreciated!

Steelers_In_DC · Posted 06-09-2015 11:00 AM

data have;

infile cards missover;

input ID Var1 Var2;

cards;

1 1 0

1 0 1

2 0 0

2 0 1

3 0 0

3 1 1

;

run;

data prep;

set have;

by id;

if var1 = 1 then flag = 1;

if var1 = 0 and var2 = 1 then flag = 1;

run;

proc sort data=prep;by id flag;

data want(drop=flag);

set prep;

by id flag;

if first.flag and not missing(flag) then output want;

run;

Astounding · Posted 06-09-2015 11:06 AM

One way:

proc sort data=have;

by id var1 var2;

run;

data want;

set have;

by id;

if last.id;

if var1=1 or var2=1; /* optional, depending on intended result */

run;

Good luck.

Ksharp · Posted 06-10-2015 09:06 AM

You didn't post the output yet .

Code: Program

data have;
input ID   Var1   Var2;
cards;
1 1 0
1 0 1
2 0 0
2 0 1
3 0 0
3 1 1
;
run;
proc sql;
create table want as 
 select *
  from have
   group by id
   having sum(var1=0) = 1 and var1=1 
  union
 select *
  from have
   group by id
   having sum(var1=0) = 2 and var2=1 ;
quit;

Xia Keshan