Re: Deleting records conditionally (before some value)

KalaBhairava · Posted 08-14-2022 07:16 AM

I have data like this and I want to flag or delete below highlighted records, i.e. value is 0 before 1 per each category and each subject.

Sub category date value
1 a 3/8/2022 0
1 a 3/9/2022 0
1 a 3/10/2022 1
1 c 3/11/2022 0
1 c 3/12/2022 1
2 a 3/8/2022 0
2 a 3/9/2022 1
2 a 3/10/2022 0
2 b 3/11/2022 1
2 b 3/12/2022 0
;

PaigeMiller · Posted 08-14-2022 07:18 AM

I'm not understanding the condition to create a flag. It seems like you want to flag all zeros. When would you not flag a zero? Please explain further.

--
Paige Miller

KalaBhairava · Posted 08-14-2022 07:22 AM

Can you check Now, updated my question.

PaigeMiller · Posted 08-14-2022 08:43 AM

data intermediate;
    set have;
    by sub category;
    if first.category then count=0;
    count+1;
    if value=1 then one_location=count;
run;
proc summary data=intermediate nway;
	class sub category;
	var one_location;
	output out=_max_ max=max_one_location;
run;
data want;
    merge intermediate _max_(drop=_:);
    by sub category;
    if value=0 and count<max_one_location then delete;
run;

--
Paige Miller

Astounding · Posted 08-14-2022 09:34 AM

When you come across a 0, you obviously don't know whether a 1 will appear later. So you have to go through the data twice. Here is one approach. The top loop goes through a sub/category combination and counts the 1's. Then the bottom loop examines the same observations and chooses which observations to output.

data want;
last1 = 0;
n_1 = 0;
Obsno = 0;
do until (last.category);
set have;
by sub category;
if value = 1 then last1 + 1;
end;
do until (last.category);
set have;
by sub category;
if value = 1 then n_1 + 1;
if value = 1 or n_1 < last_1 then output;
end;
run;

Working from a cell phone screen so I will need to look this over after I post it.

mkeintz · Posted 08-14-2022 10:42 AM

You apparently want to delete "leading zeroes" for each sub/category combination. If so then:

data want (drop=_:);
  set have;
  by sub category;
  if first.category=1 then _n_of_ones=value;
  else _n_of_ones+value;
  if _n_of_ones>0;
run;

This assumes that data are already sorted by sub/category (and presumably you require the data to be sorted by date within each sub/category). So assume data sorted by sub/category/date.

What you haven't explained is what you want to do if a given sub/category has no observations with value^=0. In such a case, the program above will drop all obs for that sub/category.

Additional comment:

You haven't answer @Kurt_Bremser 's question (or my comment above). I.e. what do you want to do if a given sub/category has only zeroes?

If you want to delete them all, then above program works. If you want to keep them all, then the following program will do:

data want (drop=_:);
  set have (where=(value=1) in=firstpass) have (in=secondpass);
  by sub category;
  if first.category then _cutoff_date=date;
  retain _cutoff_date;
  if secondpass and date>=_cutoff_date;
run;

If there are no 1's, then the _cutoff_date is the first value=0 date. But if there are some 1's, then it is the first value=1 date.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Kurt_Bremser · Posted 08-14-2022 10:42 AM

What if a sub has only zeroes?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

KalaBhairava · Posted 08-14-2022 11:56 AM

It should be there, I just want to delete 0 before 1 per each subject per category.

i.e. if subject has 1 then only I want to delete 0 which are prior to 1, and 0 value at post 1 or only 0 shouldn't be affect.

Kurt_Bremser · Posted 08-14-2022 12:39 PM

You have not answered my question: what if a sub has only 0's and no 1 at all? Should all observations of such a group be deleted or kept?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

KalaBhairava · Posted 08-15-2022 12:26 AM

It should be there

Kurt_Bremser · Posted 08-15-2022 12:57 AM

So we need an additional check for the "non-1" cases.

data want;
set have;
by sub;
if _n_ = 1
then do;
  declare hash w1 (dataset:"have (where=(value = 1))");
  w1.definekey("sub");
  w1.definedone(),
end;
retain keep_flag;
if first.sub
then if w1.check() ne 0
  then keep_flag = 1;
  else keep_flag = 0;
if value = 1 then keep_flag = 1;
if keep_flag;
drop keep_flag;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 08-15-2022 08:45 AM

data have;
input Sub  category  $ date   : $20.        value;
cards;
1          a            3/8/2022        0
1          a            3/9/2022        0
1          a            3/10/2022      1
1          c            3/11/2022      0
1          c            3/12/2022      1
2          a            3/8/2022        0
2          a           3/9/2022        1
2          a            3/10/2022      0
2          b            3/11/2022     1
2          b            3/12/2022      0
; 

data want;
 set have;
 by  Sub  category;
 retain found;
 if first.category then found=0;
 if value then found=1;
 if  found;
 drop found;
run;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away