Hi
I have retrieved data on abortions (date of abortions=d_inddto)) from a hospital discharge registry. Sometimes there are several abortions (dates) per person within a short period of time (for example 4 dates in 10 days), which I know is not right. On the other hand, one person can have multiple abortions - but there should be at least 12 weeks between each abortion. So, I want to make a rule that says that in a period of 12 weeks or 84 days, only 1 abortion (that is, only on date) should remain in my dataset per person (PNR). And it should only be the first date. How do I do that? My data looks like this:
PNR d_inddto
1 13/11/2013
1 14/11/2013
1 20/11/2013
1 22/11/2013
2 24/05/2015
3 01/09/2006
data want;
set have;
by PNR;
retain last_date;
if first.PNR
then last_date = d_inddto;
else if intck('week',last_date,d_inddto) le 12
then delete;
else last_date = d_inddto;
run;
Untested. For tested code, supply example data in a data step.
data want;
set have;
by PNR;
retain last_date;
if first.PNR
then last_date = d_inddto;
else if intck('week',last_date,d_inddto) le 12
then delete;
else last_date = d_inddto;
run;
Untested. For tested code, supply example data in a data step.
Thank you very much. It seems to work perfectly:-)
How about something like this?
data want;
set have;
by pnr;
_prevdate = lag(d_inddto);
if first.pnr or (d_inddto - _prevdate) >= 84;
drop _prevdate;
run;
This is just based on 84 days, without regard to week boundaries. It assumes the input dataset is already sorted by ascending values of pnr and d_inddto.
Josh
EDIT: syntax correction to code above
Thank you for your answer! Unfortunately, it removes all datelines except for the first one. Or maybe I haven't used it correctly..... Nevertheless, I got an answer by KurtBremser that seems to work better on my data. Again, thank you!
My mistake - I made a slight in error in the code. I've corrected it above. My apologies - I was in too much of a hurry to be the first responder and earn precious badges. 🙂
Josh
It is correct that in the very small dataset, I would expect only 3 records. However, if the dataset looked like this:
PNR d_inddto
1 13/11/2013
1 14/11/2013
1 20/11/2013
1 22/11/2013
1 03/05/2017
2 24/05/2015
3 01/09/2006
there should be 4 records left. I haven't tried it myself since I'm just trying to help a colleague. She said it didn't work on her data. But as I wrote, we may have used it incorrectly.
Ok - that makes sense:-) Thanks again!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.