BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
noelle12
Fluorite | Level 6

I want to group by name and ID, and want to delete the entire group of observations whenever both num_1 and num_2 = 1 or both = 0. 

 

What I have:

name ID num_1 num_2

A        10     1          0

A        10      0          1

A        34     1          1

B        9      1          0

B        9      0          0

B        9      0          1

C        28      1          1

C        28      1          0

C        28      0          0

C        30      1          0

C        30      0          1

 

What I want:

name ID num_1 num_2

A        10     1          0

A        10      0          1

C        30      1          0

C        30      0          1

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

This is a good case for merging the dataset against a subset of itself.  If the subset is defined by NUM_1=NUM_2, then the merge should be excluded.  For any NAME/ID combination that has no such subset, keep all the observations:

 

data have;
  input name  : $1.  id  num_1 	num_2 : ;
datalines;
A 10 1 0
A 10 0 1
A 34 1 1
B 9 1 0
B 9 0 0
B 9 0 1
C 28 1 1
C 28 1 0
C 28 0 0
C 30 1 0
C 30 0 1
;

data want;
  merge have
        have (in=exclude where=(num_1=num_2));
  by name id;
  if not exclude;
run;

Even though the subset ("where=(num_1=num_2)") is commonly smaller than the set of observations for a given ID/NAME, the MERGE statement will keep the EXCLUDE dummy=1 for the entire group.  This provides a trivial indicator for excluding every observation in the group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

3 REPLIES 3
Amir
PROC Star

Hi,

 

I have assumed num_1 and num_2 only hold values of 1 or 0:

 

data have;
	input
		name  : $1.
		id    :  8.
		num_1 :  8.
		num_2 :  8.
	;

datalines;
A 10 1 0
A 10 0 1
A 34 1 1
B 9 1 0
B 9 0 0
B 9 0 1
C 28 1 1
C 28 1 0
C 28 0 0
C 30 1 0
C 30 0 1
;


data want;
	do until(last.id);
		set have;
		by name id;
		
		if num_1 eq num_2 then
			remove = 1;
	end;
	
	do until(last.id);
		set have;
		by name id;
		
		if not remove then
			output;
	end;
run;

 

 

Kind regards,

Amir.

mkeintz
PROC Star

This is a good case for merging the dataset against a subset of itself.  If the subset is defined by NUM_1=NUM_2, then the merge should be excluded.  For any NAME/ID combination that has no such subset, keep all the observations:

 

data have;
  input name  : $1.  id  num_1 	num_2 : ;
datalines;
A 10 1 0
A 10 0 1
A 34 1 1
B 9 1 0
B 9 0 0
B 9 0 1
C 28 1 1
C 28 1 0
C 28 0 0
C 30 1 0
C 30 0 1
;

data want;
  merge have
        have (in=exclude where=(num_1=num_2));
  by name id;
  if not exclude;
run;

Even though the subset ("where=(num_1=num_2)") is commonly smaller than the set of observations for a given ID/NAME, the MERGE statement will keep the EXCLUDE dummy=1 for the entire group.  This provides a trivial indicator for excluding every observation in the group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
data have;
  input name  : $1.  id  num_1  num_2 : ;
datalines;
A 10 1 0
A 10 0 1
A 34 1 1
B 9 1 0
B 9 0 0
B 9 0 1
C 28 1 1
C 28 1 0
C 28 0 0
C 30 1 0
C 30 0 1
;

proc sql;
create table want as
select * from have group by name,id
 having not( sum(num_1=0 and num_2=0) or sum(num_1=1 and num_2=1) ) ;
quit;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1027 views
  • 2 likes
  • 4 in conversation