Hi,
I have a dataset of duplicates--each ID occurs twice (but the values of other variables can differ within the same ID), like this:
ID Var1 Var2
1 1 0
1 0 1
2 0 0
2 0 1
3 0 0
3 1 1
I want to keep one row for each ID, but I need to do it conditionally. First, if they have a value of 1 for Var1, that is the row that should be kept (each pair of rows will either have one 0 and one 1 for Var1 or two 0s). If a pair of rows has two 0s for Var1, the one with a value of 1 for Var2 should be kept.
Any help is much appreciated!
data have;
infile cards missover;
input ID Var1 Var2;
cards;
1 1 0
1 0 1
2 0 0
2 0 1
3 0 0
3 1 1
;
run;
data prep;
set have;
by id;
if var1 = 1 then flag = 1;
if var1 = 0 and var2 = 1 then flag = 1;
run;
proc sort data=prep;by id flag;
data want(drop=flag);
set prep;
by id flag;
if first.flag and not missing(flag) then output want;
run;
One way:
proc sort data=have;
by id var1 var2;
run;
data want;
set have;
by id;
if last.id;
if var1=1 or var2=1; /* optional, depending on intended result */
run;
Good luck.
You didn't post the output yet .
data have;
input ID Var1 Var2;
cards;
1 1 0
1 0 1
2 0 0
2 0 1
3 0 0
3 1 1
;
run;
proc sql;
create table want as
select *
from have
group by id
having sum(var1=0) = 1 and var1=1
union
select *
from have
group by id
having sum(var1=0) = 2 and var2=1 ;
quit;
Xia Keshan
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.