Hello all,
I have a dataset with multple rows per a user id and I would like to delete rows that are less than the date that a specific conditionstart for each id.
This is what it looks like before:
ID | Start_of_Event | Collection_Date |
1 | 0 | 15-Jan |
1 | 1 | 16-Jan |
1 | 0 | 17-Jan |
2 | 1 | 20-Feb |
2 | 0 | 28-Feb |
3 | 0 | 30-Oct |
3 | 1 | 31-Oct |
3 | 0 | 1-Nov |
4 | 1 | 15-Jul |
4 | 0 | 16-Jul |
4 | 0 | 17-Jul |
4 | 0 | 18-Jul |
This is what it looks like after:
ID | Start_of_Event | Collection_Date |
1 | 1 | 16-Jan |
1 | 0 | 17-Jan |
2 | 1 | 20-Feb |
2 | 0 | 28-Feb |
3 | 1 | 31-Oct |
3 | 0 | 1-Nov |
4 | 1 | 15-Jul |
4 | 0 | 16-Jul |
4 | 0 | 17-Jul |
4 | 0 | 18-Jul |
I have tried using logic from this, this, and this, but I have not had much luck. I was trying the lag function, but from what I have read it doesn't work well in conditionals or by id. I have also tried using first.id, but not all ids have dates that are less than the start date.
Any help would be much appreciated.
EDIT: To expand
To expand: I would like like to delete observations that are less than the collection date of the condition (Start_of_Event = 1). For example: ID = 1 has a single observation that less than the (15-Jan < 16-Jan), only one observation will be removed. ID = 2 does not have any events prior, so no rows will be removed for that ID. I have sorted the data by id and collection date.
I sthis what you need? I get the results you have listed:
data have;
input ID Start_of_Event Collection_Date date9.;
format Collection_Date date9.;
cards;
1 0 15Jan2017
1 1 16Jan2017
1 0 17Jan2017
2 1 20Feb2017
2 0 28Feb2017
3 0 30Oct2017
3 1 31Oct2017
3 0 1Nov2017
4 1 15Jul2017
4 0 16Jul2017
4 0 17Jul2017
4 0 18Jul2017
;
run;
proc sql;
create table want as
select a.*
from have a, have b
where a.id=b.id
and a.Collection_Date>=b.Collection_Date
and b.Start_of_Event=1;
quit;
Hello,
data want;
set have;
by ID;
retain start;
if first.ID then start=0;
if start=0 then start=Start_of_Event;
if start;
run;
If your requirement is more complex then "delet if the first id and the start_event is 0" the please expand on you condtion.
If not then perhaps
data want; set have; by id; if first.id and start_event=0 then delete; run;
assuming the data is sorted by id. If not you would NOTSORTED after ID in the by statement.
I sthis what you need? I get the results you have listed:
data have;
input ID Start_of_Event Collection_Date date9.;
format Collection_Date date9.;
cards;
1 0 15Jan2017
1 1 16Jan2017
1 0 17Jan2017
2 1 20Feb2017
2 0 28Feb2017
3 0 30Oct2017
3 1 31Oct2017
3 0 1Nov2017
4 1 15Jul2017
4 0 16Jul2017
4 0 17Jul2017
4 0 18Jul2017
;
run;
proc sql;
create table want as
select a.*
from have a, have b
where a.id=b.id
and a.Collection_Date>=b.Collection_Date
and b.Start_of_Event=1;
quit;
data have; input ID Start_of_Event Collection_Date date9.; format Collection_Date date9.; cards; 1 0 15Jan2017 1 1 16Jan2017 1 0 17Jan2017 2 1 20Feb2017 2 0 28Feb2017 3 0 30Oct2017 3 1 31Oct2017 3 0 1Nov2017 4 1 15Jul2017 4 0 16Jul2017 4 0 17Jul2017 4 0 18Jul2017 ; run; data want; set have; by ID; retain found; if first.id then found=0; if start_of_event=1 then found=1; if found; run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.