BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
noelle12
Fluorite | Level 6

I want to group by name and ID, and want to delete the entire group of observations whenever both num_1 and num_2 = 1 or both = 0. 

 

What I have:

name ID num_1 num_2

A        10     1          0

A        10      0          1

A        34     1          1

B        9      1          0

B        9      0          0

B        9      0          1

C        28      1          1

C        28      1          0

C        28      0          0

C        30      1          0

C        30      0          1

 

What I want:

name ID num_1 num_2

A        10     1          0

A        10      0          1

C        30      1          0

C        30      0          1

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

This is a good case for merging the dataset against a subset of itself.  If the subset is defined by NUM_1=NUM_2, then the merge should be excluded.  For any NAME/ID combination that has no such subset, keep all the observations:

 

data have;
  input name  : $1.  id  num_1 	num_2 : ;
datalines;
A 10 1 0
A 10 0 1
A 34 1 1
B 9 1 0
B 9 0 0
B 9 0 1
C 28 1 1
C 28 1 0
C 28 0 0
C 30 1 0
C 30 0 1
;

data want;
  merge have
        have (in=exclude where=(num_1=num_2));
  by name id;
  if not exclude;
run;

Even though the subset ("where=(num_1=num_2)") is commonly smaller than the set of observations for a given ID/NAME, the MERGE statement will keep the EXCLUDE dummy=1 for the entire group.  This provides a trivial indicator for excluding every observation in the group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

3 REPLIES 3
Amir
PROC Star

Hi,

 

I have assumed num_1 and num_2 only hold values of 1 or 0:

 

data have;
	input
		name  : $1.
		id    :  8.
		num_1 :  8.
		num_2 :  8.
	;

datalines;
A 10 1 0
A 10 0 1
A 34 1 1
B 9 1 0
B 9 0 0
B 9 0 1
C 28 1 1
C 28 1 0
C 28 0 0
C 30 1 0
C 30 0 1
;


data want;
	do until(last.id);
		set have;
		by name id;
		
		if num_1 eq num_2 then
			remove = 1;
	end;
	
	do until(last.id);
		set have;
		by name id;
		
		if not remove then
			output;
	end;
run;

 

 

Kind regards,

Amir.

mkeintz
PROC Star

This is a good case for merging the dataset against a subset of itself.  If the subset is defined by NUM_1=NUM_2, then the merge should be excluded.  For any NAME/ID combination that has no such subset, keep all the observations:

 

data have;
  input name  : $1.  id  num_1 	num_2 : ;
datalines;
A 10 1 0
A 10 0 1
A 34 1 1
B 9 1 0
B 9 0 0
B 9 0 1
C 28 1 1
C 28 1 0
C 28 0 0
C 30 1 0
C 30 0 1
;

data want;
  merge have
        have (in=exclude where=(num_1=num_2));
  by name id;
  if not exclude;
run;

Even though the subset ("where=(num_1=num_2)") is commonly smaller than the set of observations for a given ID/NAME, the MERGE statement will keep the EXCLUDE dummy=1 for the entire group.  This provides a trivial indicator for excluding every observation in the group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
data have;
  input name  : $1.  id  num_1  num_2 : ;
datalines;
A 10 1 0
A 10 0 1
A 34 1 1
B 9 1 0
B 9 0 0
B 9 0 1
C 28 1 1
C 28 1 0
C 28 0 0
C 30 1 0
C 30 0 1
;

proc sql;
create table want as
select * from have group by name,id
 having not( sum(num_1=0 and num_2=0) or sum(num_1=1 and num_2=1) ) ;
quit;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1111 views
  • 2 likes
  • 4 in conversation