Hi,
I have a dataset of duplicates--each ID occurs twice (but the values of other variables can differ within the same ID), like this:
ID Var1 Var2
1 1 0
1 0 1
2 0 0
2 0 1
3 0 0
3 1 1
I want to keep one row for each ID, but I need to do it conditionally. First, if they have a value of 1 for Var1, that is the row that should be kept (each pair of rows will either have one 0 and one 1 for Var1 or two 0s). If a pair of rows has two 0s for Var1, the one with a value of 1 for Var2 should be kept.
Any help is much appreciated!
data have;
infile cards missover;
input ID Var1 Var2;
cards;
1 1 0
1 0 1
2 0 0
2 0 1
3 0 0
3 1 1
;
run;
data prep;
set have;
by id;
if var1 = 1 then flag = 1;
if var1 = 0 and var2 = 1 then flag = 1;
run;
proc sort data=prep;by id flag;
data want(drop=flag);
set prep;
by id flag;
if first.flag and not missing(flag) then output want;
run;
One way:
proc sort data=have;
by id var1 var2;
run;
data want;
set have;
by id;
if last.id;
if var1=1 or var2=1; /* optional, depending on intended result */
run;
Good luck.
You didn't post the output yet .
data have;
input ID Var1 Var2;
cards;
1 1 0
1 0 1
2 0 0
2 0 1
3 0 0
3 1 1
;
run;
proc sql;
create table want as
select *
from have
group by id
having sum(var1=0) = 1 and var1=1
union
select *
from have
group by id
having sum(var1=0) = 2 and var2=1 ;
quit;
Xia Keshan
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.