Hello Colleagues, I am new to SAS and I would like help with some data manipulation from someone here.
Question: I have data as shown below, but I would like to collapse some rows with unreasonable time interval (job durations). I would like to assume that durations of less than 5 days are unlikely to be true and hwere such exists, the next job with longer duration should be assumed to have been began on the start date of the overridden job.
Data I have:
id | Start_date | End-date | JOB |
1 | 12JUN2015 | 14JUN2016 | CHEF |
1 | 14JUN2016 | 15JUN2016 | MANAGER |
1 | 15JUN2016 | 17JAN2018 | DRIVER |
1 | 18JAN2015 | 19JAN2018 | ADMIN |
1 | 19JAN2018 | 19OCT2019 | FIRE OFFICER |
2 |
|
|
|
2 |
|
|
|
2 |
|
|
|
Data I want
id | Start_date | End-date | JOB |
1 | 12JUN2016 | 19JAN2018 | DRIVER |
1 | 19JAN2018 | 19OCT2019 | FIRE OFFICER |
2 |
|
|
|
2 |
|
|
|
2 |
|
|
|
Thanks
Dathan Byonanebye
Hi @DathanMD I read this post of yours 3 times and still not sure of the logic. So I am gonna take a guess
data have;
input id (Start_date End_date)(:date9.) JOB & $20.;
format Start_date End_date date9.;
cards;
1
12JUN2015
14JUN2016
CHEF
1
14JUN2016
15JUN2016
MANAGER
1
15JUN2016
17JAN2018
DRIVER
1
18JAN2015
19JAN2018
ADMIN
1
19JAN2018
19OCT2019
FIRE OFFICER
;
data want;
do until(last.id);
set have;
by id;
if t=start_date and intck('days',start_date,end_date)>5 then output;
t=end_date;
end;
drop t;
run;
id | Start_date | End_date | JOB |
---|---|---|---|
1 | 15JUN2016 | 17JAN2018 | DRIVER |
1 | 19JAN2018 | 19OCT2019 | FIRE OFFICER |
Hi @DathanMD I read this post of yours 3 times and still not sure of the logic. So I am gonna take a guess
data have;
input id (Start_date End_date)(:date9.) JOB & $20.;
format Start_date End_date date9.;
cards;
1
12JUN2015
14JUN2016
CHEF
1
14JUN2016
15JUN2016
MANAGER
1
15JUN2016
17JAN2018
DRIVER
1
18JAN2015
19JAN2018
ADMIN
1
19JAN2018
19OCT2019
FIRE OFFICER
;
data want;
do until(last.id);
set have;
by id;
if t=start_date and intck('days',start_date,end_date)>5 then output;
t=end_date;
end;
drop t;
run;
id | Start_date | End_date | JOB |
---|---|---|---|
1 | 15JUN2016 | 17JAN2018 | DRIVER |
1 | 19JAN2018 | 19OCT2019 | FIRE OFFICER |
Thank you Novinosrin.
My issue main issue is to override dates (rows) where the difference between start and end_dates is less than five days.
I will explain a little better (I hope I do!). Let’s assume I have data below regarding patient treatment with variables patient, start_date, end_date and antibiotic.
Patient start_date end_date antibiotic
101 26DEC1994 31JAN1995 Ceftriaxone
101 25FEB1997 06OCT1997 ciprofloxacin
101 07OCT1997 28JAN1998 doxycycline
101 28JAN1998 09DEC1998 amoxacillin
101 09DEC1998 14DEC1999 cefixime
101 15DEC1999 14AUG2000 azithromycin
101 23APR2001 25APR2001 cefamor
101 25APR2001 31JUL2002 cefaclor
101 31JUL2002 01AUG2002 zedex
101 01AUG2002 13JAN2003 bromoxin
101 14JAN2003 14MAY2005 cedrol
101 14MAY2005 30MAY2005 kabryl
101 27NOV2007 02NOV2010 benedryl
What this data shows is that patient 101 too ceftriaxone for 37 days then ciprofloxacin from feb-oct1997 . But took cefamor for only 2 days. In that case I would like the data to look like the patient took cefaclor from 23APR2001, there by eliminating Cefamor from the dataset. In the same way the end_date for kabryl should become 02NOV2010, thereby eliminating the row here treatment-benedryl.
This procedure is aimed at eliminating rows where (end_date-startdate)<5 days but restore continuity in dates.
Is a patient only ever going to use one antibiotic at a time? If not then how to treat overlapping dates or a case where patient 101 start taking a 2nd antibiotic on 25APR2001 (or overlapping on 24APR2001? or uses another med from 01APR2001 to 31APR2001?, or if there is a gap and the new med only starts at 27APR2001? ...or...).
hI patrick. I would like a patient taking one antibiotic at a time (although I know in reeality patients may take more than one). But, in this case, i would like to retain only the antibiotiv with the longest durationa nd its dates whould overlap the antibiotics whose duration<5 days.
Dathan
@DathanMD wrote:
hI patrick. I would like a patient taking one antibiotic at a time (although I know in reeality patients may take more than one). But, in this case, i would like to retain only the antibiotiv with the longest durationa nd its dates whould overlap the antibiotics whose duration<5 days.
Dathan
The question is: Does your actual data support the simple logic you describe or can there be multiple medications with overlapping date ranges or gaps? If your actual data is not as simple as the sample data you've posted then you need to define the transformation logic which is suitable for your actual data - and you should post sample data which is representative for your actual data.
Sooo.... what is it?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.