Thanks all in advance of helping.
The dataset I want would look like this:
data want; input ID $ program $ admit $ discharge $ dif_days $ episode; cards; 1 a 31DEC2017 31DEC2017 . 1 1 a 31DEC2017 01JAN2018 0 1 1 a 16APR2020 16APR2020 836 2 1 a 16APR2020 16APR2020 0 2 1 b 28MAR2020 28MAR2020 -19 3 1 b 28MAR2020 31MAR2020 0 3 1 b 29MAR2020 29MAR2020 -2 3 1 b 30MAR2020 30MAR2020 1 3 2 c 16JUN2015 16JUN2016 -1766 1 2 a 03UL2018 03JUL2018 1113 2 2 a 06JUL2018 06JUL2018 3 3 ;run;
What I have is essentially absent the column dif_days and episode.
I want a new episode assignment (within each ID, resets with each new ID) with the following logic;
1) First id gets episode 1
2) If the difference in days is greater than 1 absolute difference, episode value + 1
3) If program name is different (all within one id), episode value + 1
The dataset is currently sorted by id program admit
However, with my current code its giving an output of the following
data want; set have; by id program; format dif_days episode 8.; dif_days = admit- lag(discharge); retain episode; if first.id then episode=1; else if id ne lag(id) then episode+1; else if dif_days >1 then episode+1; else episode = lag(episode); run;
and what I get is this:
data have; input ID $ program $ admit $ discharge $ dif_days $ episode; cards; 1 a 31DEC2017 31DEC2017 . 1 1 a 31DEC2017 01JAN2018 0 2 1 a 16APR2020 16APR2020 836 3 1 a 16APR2020 16APR2020 0 . 1 b 28MAR2020 28MAR2020 -19 3 1 b 28MAR2020 31MAR2020 0 . 1 b 29MAR2020 29MAR2020 -2 3 1 b 30MAR2020 30MAR2020 1 . 2 c 16JUN2015 16JUN2016 -1766 1 2 a 03UL2018 03JUL2018 1113 2 2 a 06JUL2018 06JUL2018 3 3 ;run;
I'm not sure why the second row is getting a 2 (none of the episode + 1 conditions are met?)
Ultimately, I'm trying to come up with a beginning and end date for all episodes and this is just the first step...
data want;
input ID $ program $ admit_final $ discharge_final $ episode;
cards;
1 a 31DEC2017 01DEC2018 1
1 a 16APR2020 16APR2020 2
1 b 28MAR2020 31MAR2020 3
2 c 16JUN2015 16JUN2016 1
2 a 03UL2018 03JUL2018 2
2 a 06JUL2018 06JUL2018 3 3
;run;
I've tried applying some of the logic here but not working...
Thank you all for helping this SAS beginner!
2) If the difference in days is greater than 1 absolute difference, episode value + 1
This requires more explanation. What difference are you referring to?
Thank you! Apologies, I was trying to create a model dataset. What I actually "have" is what is displayed above MINUS the "episode" and "Dif_days" columns.
And regarding the sorting, I was manually populating an example so accidentally did have c next to a program a! So in the dataset I am working with that is actually sorted. My apologies!
Hello @cdubs,
Assuming a slightly modified HAVE dataset ...
data have; input ID $ program $ admit :date. discharge :date.; format admit discharge date9.; cards; 1 a 31DEC2017 31DEC2017 1 a 31DEC2017 01JAN2018 1 a 16APR2020 16APR2020 1 a 16APR2020 16APR2020 1 b 28MAR2020 28MAR2020 1 b 28MAR2020 31MAR2020 1 b 29MAR2020 29MAR2020 1 b 30MAR2020 30MAR2020 2 c 16JUN2015 16JUN2016 2 d 03JUL2018 03JUL2018 2 d 06JUL2018 06JUL2018 ;
... and that the only purpose of variable dif_days would be to compute episode, I suggest this:
data want(drop=prev_dis);
set have;
by id program;
prev_dis=lag(discharge);
if first.id then episode=1;
else if first.program | admit-prev_dis>1 then episode+1;
run;
proc summary data=want;
by id program episode;
var admit discharge;
output out=final(drop=_:) min(admit)=admit_final max(discharge)=discharge_final;
run;
@cdubs wrote:
data want; set have; by id program; format dif_days episode 8.; dif_days = admit- lag(discharge); retain episode; if first.id then episode=1; else if id ne lag(id) then episode+1; else if dif_days >1 then episode+1; else episode = lag(episode); run;and what I get is this:
data have; input ID $ program $ admit $ discharge $ dif_days $ episode; cards; 1 a 31DEC2017 31DEC2017 . 1 1 a 31DEC2017 01JAN2018 0 2 1 a 16APR2020 16APR2020 836 3 1 a 16APR2020 16APR2020 0 . 1 b 28MAR2020 28MAR2020 -19 3 1 b 28MAR2020 31MAR2020 0 . 1 b 29MAR2020 29MAR2020 -2 3 1 b 30MAR2020 30MAR2020 1 . 2 c 16JUN2015 16JUN2016 -1766 1 2 a 03UL2018 03JUL2018 1113 2 2 a 06JUL2018 06JUL2018 3 3 ;run;I'm not sure why the second row is getting a 2 (none of the episode + 1 conditions are met?)
This is because lag(id) is evaluated in the second observation for the first time (the ELSE condition was not met in the first observation!). It returns a missing value, which is different from the current id value, hence episode is incremented. Note the unconditional use of the LAG function in my suggested DATA step.
Your code can't work at all, as you can't make calculations with those strings that look like dates. After converting the strings to dates, do this:
data want;
set have;
by id program;
retain episode;
if first.id then episode = 0;
if first.program or admit - lag(discharge) > 0 then episode + 1;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.