Hello all,
I have a dataset with multple rows per a user id and I would like to delete rows that are less than the date that a specific conditionstart for each id.
This is what it looks like before:
ID | Start_of_Event | Collection_Date |
1 | 0 | 15-Jan |
1 | 1 | 16-Jan |
1 | 0 | 17-Jan |
2 | 1 | 20-Feb |
2 | 0 | 28-Feb |
3 | 0 | 30-Oct |
3 | 1 | 31-Oct |
3 | 0 | 1-Nov |
4 | 1 | 15-Jul |
4 | 0 | 16-Jul |
4 | 0 | 17-Jul |
4 | 0 | 18-Jul |
This is what it looks like after:
ID | Start_of_Event | Collection_Date |
1 | 1 | 16-Jan |
1 | 0 | 17-Jan |
2 | 1 | 20-Feb |
2 | 0 | 28-Feb |
3 | 1 | 31-Oct |
3 | 0 | 1-Nov |
4 | 1 | 15-Jul |
4 | 0 | 16-Jul |
4 | 0 | 17-Jul |
4 | 0 | 18-Jul |
I have tried using logic from this, this, and this, but I have not had much luck. I was trying the lag function, but from what I have read it doesn't work well in conditionals or by id. I have also tried using first.id, but not all ids have dates that are less than the start date.
Any help would be much appreciated.
EDIT: To expand
To expand: I would like like to delete observations that are less than the collection date of the condition (Start_of_Event = 1). For example: ID = 1 has a single observation that less than the (15-Jan < 16-Jan), only one observation will be removed. ID = 2 does not have any events prior, so no rows will be removed for that ID. I have sorted the data by id and collection date.
I sthis what you need? I get the results you have listed:
data have;
input ID Start_of_Event Collection_Date date9.;
format Collection_Date date9.;
cards;
1 0 15Jan2017
1 1 16Jan2017
1 0 17Jan2017
2 1 20Feb2017
2 0 28Feb2017
3 0 30Oct2017
3 1 31Oct2017
3 0 1Nov2017
4 1 15Jul2017
4 0 16Jul2017
4 0 17Jul2017
4 0 18Jul2017
;
run;
proc sql;
create table want as
select a.*
from have a, have b
where a.id=b.id
and a.Collection_Date>=b.Collection_Date
and b.Start_of_Event=1;
quit;
Hello,
data want;
set have;
by ID;
retain start;
if first.ID then start=0;
if start=0 then start=Start_of_Event;
if start;
run;
If your requirement is more complex then "delet if the first id and the start_event is 0" the please expand on you condtion.
If not then perhaps
data want; set have; by id; if first.id and start_event=0 then delete; run;
assuming the data is sorted by id. If not you would NOTSORTED after ID in the by statement.
I sthis what you need? I get the results you have listed:
data have;
input ID Start_of_Event Collection_Date date9.;
format Collection_Date date9.;
cards;
1 0 15Jan2017
1 1 16Jan2017
1 0 17Jan2017
2 1 20Feb2017
2 0 28Feb2017
3 0 30Oct2017
3 1 31Oct2017
3 0 1Nov2017
4 1 15Jul2017
4 0 16Jul2017
4 0 17Jul2017
4 0 18Jul2017
;
run;
proc sql;
create table want as
select a.*
from have a, have b
where a.id=b.id
and a.Collection_Date>=b.Collection_Date
and b.Start_of_Event=1;
quit;
data have; input ID Start_of_Event Collection_Date date9.; format Collection_Date date9.; cards; 1 0 15Jan2017 1 1 16Jan2017 1 0 17Jan2017 2 1 20Feb2017 2 0 28Feb2017 3 0 30Oct2017 3 1 31Oct2017 3 0 1Nov2017 4 1 15Jul2017 4 0 16Jul2017 4 0 17Jul2017 4 0 18Jul2017 ; run; data want; set have; by ID; retain found; if first.id then found=0; if start_of_event=1 then found=1; if found; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.