Hello all,
I have a dataset with multple rows per a user id and I would like to delete rows that are less than the date that a specific conditionstart for each id.
This is what it looks like before:
ID | Start_of_Event | Collection_Date |
1 | 0 | 15-Jan |
1 | 1 | 16-Jan |
1 | 0 | 17-Jan |
2 | 1 | 20-Feb |
2 | 0 | 28-Feb |
3 | 0 | 30-Oct |
3 | 1 | 31-Oct |
3 | 0 | 1-Nov |
4 | 1 | 15-Jul |
4 | 0 | 16-Jul |
4 | 0 | 17-Jul |
4 | 0 | 18-Jul |
This is what it looks like after:
ID | Start_of_Event | Collection_Date |
1 | 1 | 16-Jan |
1 | 0 | 17-Jan |
2 | 1 | 20-Feb |
2 | 0 | 28-Feb |
3 | 1 | 31-Oct |
3 | 0 | 1-Nov |
4 | 1 | 15-Jul |
4 | 0 | 16-Jul |
4 | 0 | 17-Jul |
4 | 0 | 18-Jul |
I have tried using logic from this, this, and this, but I have not had much luck. I was trying the lag function, but from what I have read it doesn't work well in conditionals or by id. I have also tried using first.id, but not all ids have dates that are less than the start date.
Any help would be much appreciated.
EDIT: To expand
To expand: I would like like to delete observations that are less than the collection date of the condition (Start_of_Event = 1). For example: ID = 1 has a single observation that less than the (15-Jan < 16-Jan), only one observation will be removed. ID = 2 does not have any events prior, so no rows will be removed for that ID. I have sorted the data by id and collection date.
I sthis what you need? I get the results you have listed:
data have;
input ID Start_of_Event Collection_Date date9.;
format Collection_Date date9.;
cards;
1 0 15Jan2017
1 1 16Jan2017
1 0 17Jan2017
2 1 20Feb2017
2 0 28Feb2017
3 0 30Oct2017
3 1 31Oct2017
3 0 1Nov2017
4 1 15Jul2017
4 0 16Jul2017
4 0 17Jul2017
4 0 18Jul2017
;
run;
proc sql;
create table want as
select a.*
from have a, have b
where a.id=b.id
and a.Collection_Date>=b.Collection_Date
and b.Start_of_Event=1;
quit;
Hello,
data want;
set have;
by ID;
retain start;
if first.ID then start=0;
if start=0 then start=Start_of_Event;
if start;
run;
If your requirement is more complex then "delet if the first id and the start_event is 0" the please expand on you condtion.
If not then perhaps
data want; set have; by id; if first.id and start_event=0 then delete; run;
assuming the data is sorted by id. If not you would NOTSORTED after ID in the by statement.
I sthis what you need? I get the results you have listed:
data have;
input ID Start_of_Event Collection_Date date9.;
format Collection_Date date9.;
cards;
1 0 15Jan2017
1 1 16Jan2017
1 0 17Jan2017
2 1 20Feb2017
2 0 28Feb2017
3 0 30Oct2017
3 1 31Oct2017
3 0 1Nov2017
4 1 15Jul2017
4 0 16Jul2017
4 0 17Jul2017
4 0 18Jul2017
;
run;
proc sql;
create table want as
select a.*
from have a, have b
where a.id=b.id
and a.Collection_Date>=b.Collection_Date
and b.Start_of_Event=1;
quit;
data have; input ID Start_of_Event Collection_Date date9.; format Collection_Date date9.; cards; 1 0 15Jan2017 1 1 16Jan2017 1 0 17Jan2017 2 1 20Feb2017 2 0 28Feb2017 3 0 30Oct2017 3 1 31Oct2017 3 0 1Nov2017 4 1 15Jul2017 4 0 16Jul2017 4 0 17Jul2017 4 0 18Jul2017 ; run; data want; set have; by ID; retain found; if first.id then found=0; if start_of_event=1 then found=1; if found; run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.