I want to group by name and ID, and want to delete the entire group of observations whenever both num_1 and num_2 = 1 or both = 0.
What I have:
name ID num_1 num_2
A 10 1 0
A 10 0 1
A 34 1 1
B 9 1 0
B 9 0 0
B 9 0 1
C 28 1 1
C 28 1 0
C 28 0 0
C 30 1 0
C 30 0 1
What I want:
name ID num_1 num_2
A 10 1 0
A 10 0 1
C 30 1 0
C 30 0 1
This is a good case for merging the dataset against a subset of itself. If the subset is defined by NUM_1=NUM_2, then the merge should be excluded. For any NAME/ID combination that has no such subset, keep all the observations:
data have;
input name : $1. id num_1 num_2 : ;
datalines;
A 10 1 0
A 10 0 1
A 34 1 1
B 9 1 0
B 9 0 0
B 9 0 1
C 28 1 1
C 28 1 0
C 28 0 0
C 30 1 0
C 30 0 1
;
data want;
merge have
have (in=exclude where=(num_1=num_2));
by name id;
if not exclude;
run;
Even though the subset ("where=(num_1=num_2)") is commonly smaller than the set of observations for a given ID/NAME, the MERGE statement will keep the EXCLUDE dummy=1 for the entire group. This provides a trivial indicator for excluding every observation in the group.
Hi,
I have assumed num_1 and num_2 only hold values of 1 or 0:
data have;
input
name : $1.
id : 8.
num_1 : 8.
num_2 : 8.
;
datalines;
A 10 1 0
A 10 0 1
A 34 1 1
B 9 1 0
B 9 0 0
B 9 0 1
C 28 1 1
C 28 1 0
C 28 0 0
C 30 1 0
C 30 0 1
;
data want;
do until(last.id);
set have;
by name id;
if num_1 eq num_2 then
remove = 1;
end;
do until(last.id);
set have;
by name id;
if not remove then
output;
end;
run;
Kind regards,
Amir.
This is a good case for merging the dataset against a subset of itself. If the subset is defined by NUM_1=NUM_2, then the merge should be excluded. For any NAME/ID combination that has no such subset, keep all the observations:
data have;
input name : $1. id num_1 num_2 : ;
datalines;
A 10 1 0
A 10 0 1
A 34 1 1
B 9 1 0
B 9 0 0
B 9 0 1
C 28 1 1
C 28 1 0
C 28 0 0
C 30 1 0
C 30 0 1
;
data want;
merge have
have (in=exclude where=(num_1=num_2));
by name id;
if not exclude;
run;
Even though the subset ("where=(num_1=num_2)") is commonly smaller than the set of observations for a given ID/NAME, the MERGE statement will keep the EXCLUDE dummy=1 for the entire group. This provides a trivial indicator for excluding every observation in the group.
data have;
input name : $1. id num_1 num_2 : ;
datalines;
A 10 1 0
A 10 0 1
A 34 1 1
B 9 1 0
B 9 0 0
B 9 0 1
C 28 1 1
C 28 1 0
C 28 0 0
C 30 1 0
C 30 0 1
;
proc sql;
create table want as
select * from have group by name,id
having not( sum(num_1=0 and num_2=0) or sum(num_1=1 and num_2=1) ) ;
quit;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.