Solved: Delete group of observation with certain condition

noelle12 · Posted 03-17-2022 03:41 PM

I want to group by name and ID, and want to delete the entire group of observations whenever both num_1 and num_2 = 1 or both = 0.

What I have:

name ID num_1 num_2

A 10 1 0

A 10 0 1

A 34 1 1

B 9 1 0

B 9 0 0

B 9 0 1

C 28 1 1

C 28 1 0

C 28 0 0

C 30 1 0

C 30 0 1

What I want:

name ID num_1 num_2

A 10 1 0

A 10 0 1

C 30 1 0

C 30 0 1

mkeintz · Posted 03-17-2022 08:52 PM

This is a good case for merging the dataset against a subset of itself. If the subset is defined by NUM_1=NUM_2, then the merge should be excluded. For any NAME/ID combination that has no such subset, keep all the observations:

data have;
  input name  : $1.  id  num_1 	num_2 : ;
datalines;
A 10 1 0
A 10 0 1
A 34 1 1
B 9 1 0
B 9 0 0
B 9 0 1
C 28 1 1
C 28 1 0
C 28 0 0
C 30 1 0
C 30 0 1
;

data want;
  merge have
        have (in=exclude where=(num_1=num_2));
  by name id;
  if not exclude;
run;

Even though the subset ("where=(num_1=num_2)") is commonly smaller than the set of observations for a given ID/NAME, the MERGE statement will keep the EXCLUDE dummy=1 for the entire group. This provides a trivial indicator for excluding every observation in the group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

Amir · Posted 03-17-2022 04:37 PM

Hi,

I have assumed num_1 and num_2 only hold values of 1 or 0:

data have;
	input
		name  : $1.
		id    :  8.
		num_1 :  8.
		num_2 :  8.
	;

datalines;
A 10 1 0
A 10 0 1
A 34 1 1
B 9 1 0
B 9 0 0
B 9 0 1
C 28 1 1
C 28 1 0
C 28 0 0
C 30 1 0
C 30 0 1
;


data want;
	do until(last.id);
		set have;
		by name id;
		
		if num_1 eq num_2 then
			remove = 1;
	end;
	
	do until(last.id);
		set have;
		by name id;
		
		if not remove then
			output;
	end;
run;

Kind regards,

Amir.

mkeintz · Posted 03-17-2022 08:52 PM

This is a good case for merging the dataset against a subset of itself. If the subset is defined by NUM_1=NUM_2, then the merge should be excluded. For any NAME/ID combination that has no such subset, keep all the observations:

data have;
  input name  : $1.  id  num_1 	num_2 : ;
datalines;
A 10 1 0
A 10 0 1
A 34 1 1
B 9 1 0
B 9 0 0
B 9 0 1
C 28 1 1
C 28 1 0
C 28 0 0
C 30 1 0
C 30 0 1
;

data want;
  merge have
        have (in=exclude where=(num_1=num_2));
  by name id;
  if not exclude;
run;

Even though the subset ("where=(num_1=num_2)") is commonly smaller than the set of observations for a given ID/NAME, the MERGE statement will keep the EXCLUDE dummy=1 for the entire group. This provides a trivial indicator for excluding every observation in the group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 03-18-2022 07:49 AM

data have;
  input name  : $1.  id  num_1  num_2 : ;
datalines;
A 10 1 0
A 10 0 1
A 34 1 1
B 9 1 0
B 9 0 0
B 9 0 1
C 28 1 1
C 28 1 0
C 28 0 0
C 30 1 0
C 30 0 1
;

proc sql;
create table want as
select * from have group by name,id
 having not( sum(num_1=0 and num_2=0) or sum(num_1=1 and num_2=1) ) ;
quit;

Delete group of observation with certain condition

Re: Delete group of observation with certain condition

Re: Delete group of observation with certain condition

Re: Delete group of observation with certain condition

Re: Delete group of observation with certain condition

Delete group of observation with certain condition

Re: Delete group of observation with certain condition

Re: Delete group of observation with certain condition

Re: Delete group of observation with certain condition

Re: Delete group of observation with certain condition

Click image to register for webinar

Classroom Training Available!