Hi,
I have a dataset of duplicates--each ID occurs twice (but the values of other variables can differ within the same ID), like this:
ID Var1 Var2
1 1 0
1 0 1
2 0 0
2 0 1
3 0 0
3 1 1
I want to keep one row for each ID, but I need to do it conditionally. First, if they have a value of 1 for Var1, that is the row that should be kept (each pair of rows will either have one 0 and one 1 for Var1 or two 0s). If a pair of rows has two 0s for Var1, the one with a value of 1 for Var2 should be kept.
Any help is much appreciated!
data have;
infile cards missover;
input ID Var1 Var2;
cards;
1 1 0
1 0 1
2 0 0
2 0 1
3 0 0
3 1 1
;
run;
data prep;
set have;
by id;
if var1 = 1 then flag = 1;
if var1 = 0 and var2 = 1 then flag = 1;
run;
proc sort data=prep;by id flag;
data want(drop=flag);
set prep;
by id flag;
if first.flag and not missing(flag) then output want;
run;
One way:
proc sort data=have;
by id var1 var2;
run;
data want;
set have;
by id;
if last.id;
if var1=1 or var2=1; /* optional, depending on intended result */
run;
Good luck.
You didn't post the output yet .
data have;
input ID Var1 Var2;
cards;
1 1 0
1 0 1
2 0 0
2 0 1
3 0 0
3 1 1
;
run;
proc sql;
create table want as
select *
from have
group by id
having sum(var1=0) = 1 and var1=1
union
select *
from have
group by id
having sum(var1=0) = 2 and var2=1 ;
quit;
Xia Keshan
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.