Hi,
I have a dataset of duplicates--each ID occurs twice (but the values of other variables can differ within the same ID), like this:
ID Var1 Var2
1 1 0
1 0 1
2 0 0
2 0 1
3 0 0
3 1 1
I want to keep one row for each ID, but I need to do it conditionally. First, if they have a value of 1 for Var1, that is the row that should be kept (each pair of rows will either have one 0 and one 1 for Var1 or two 0s). If a pair of rows has two 0s for Var1, the one with a value of 1 for Var2 should be kept.
Any help is much appreciated!
data have;
infile cards missover;
input ID Var1 Var2;
cards;
1 1 0
1 0 1
2 0 0
2 0 1
3 0 0
3 1 1
;
run;
data prep;
set have;
by id;
if var1 = 1 then flag = 1;
if var1 = 0 and var2 = 1 then flag = 1;
run;
proc sort data=prep;by id flag;
data want(drop=flag);
set prep;
by id flag;
if first.flag and not missing(flag) then output want;
run;
One way:
proc sort data=have;
by id var1 var2;
run;
data want;
set have;
by id;
if last.id;
if var1=1 or var2=1; /* optional, depending on intended result */
run;
Good luck.
You didn't post the output yet .
data have;
input ID Var1 Var2;
cards;
1 1 0
1 0 1
2 0 0
2 0 1
3 0 0
3 1 1
;
run;
proc sql;
create table want as
select *
from have
group by id
having sum(var1=0) = 1 and var1=1
union
select *
from have
group by id
having sum(var1=0) = 2 and var2=1 ;
quit;
Xia Keshan
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.