BookmarkSubscribeRSS Feed
cdubs
Quartz | Level 8

Thanks all in advance of helping. 

 

The dataset I want would look like this: 

 

 

data want;
input ID $ program $ admit $ discharge $ dif_days $ episode;
cards;
1 a 31DEC2017 31DEC2017 . 1
1 a 31DEC2017 01JAN2018 0 1
1 a 16APR2020 16APR2020 836 2
1 a 16APR2020 16APR2020 0 2
1 b 28MAR2020 28MAR2020 -19 3
1 b 28MAR2020 31MAR2020 0 3
1 b 29MAR2020 29MAR2020 -2 3
1 b 30MAR2020 30MAR2020 1 3
2 c 16JUN2015 16JUN2016 -1766 1
2 a 03UL2018 03JUL2018 1113 2
2 a 06JUL2018 06JUL2018 3 3
;run;

 

 

What I have is essentially absent the column dif_days and episode. 

 

I want a new episode assignment (within each ID, resets with each new ID) with the following logic; 

1) First id gets episode 1 

2) If the difference in days is greater than 1 absolute difference, episode value + 1 

3) If program name is different (all within one id), episode value + 1 

 

The dataset is currently sorted by id program admit

 

However, with my current code its giving an output of the following 

 

 

	data want;
		set have;
		by id program; 
		format dif_days episode 8.;
		dif_days = admit- lag(discharge);
	        retain episode;
		if first.id then episode=1; 
		   else if id ne lag(id) then episode+1; 
	           else if dif_days >1 then episode+1;
		   else episode = lag(episode);
         run;

and what I get is this: 

 

 

 

data have;
  input ID $ program $ admit $ discharge $ dif_days $ episode;
cards;
1 a 31DEC2017 31DEC2017 . 1
1 a 31DEC2017 01JAN2018 0 2 
1 a 16APR2020 16APR2020 836 3
1 a 16APR2020 16APR2020 0 .
1 b 28MAR2020 28MAR2020 -19 3 
1 b 28MAR2020 31MAR2020 0 . 
1 b 29MAR2020 29MAR2020 -2 3
1 b 30MAR2020 30MAR2020 1 . 
2 c 16JUN2015 16JUN2016 -1766 1 
2 a 03UL2018 03JUL2018 1113 2 
2 a 06JUL2018 06JUL2018 3 3
;run;

I'm not sure why the second row is getting a 2 (none of the episode + 1 conditions are met?) 

 

Ultimately, I'm trying to come up with a beginning and end date for all episodes and this is just the first step... 

 

data want;
input ID $ program $ admit_final $ discharge_final $ episode;
cards;
1 a 31DEC2017 01DEC2018 1
1 a 16APR2020 16APR2020 2
1 b 28MAR2020 31MAR2020 3
2 c 16JUN2015 16JUN2016 1
2 a 03UL2018 03JUL2018 2
2 a 06JUL2018 06JUL2018 3 3
;run;

I've tried applying some of the logic here but not working... 

 

Thank you all for helping this SAS beginner! 

5 REPLIES 5
PaigeMiller
Diamond | Level 26

2) If the difference in days is greater than 1 absolute difference, episode value + 1

 

This requires more explanation. What difference are you referring to?

--
Paige Miller
JosvanderVelden
SAS Super FREQ
Your code is confusing. In both the have and want datasets you have the column/variable 'episode'! In the have dataset you have the column 'dif_days' but you also calculate 'dif_days' for the want dataset. In the code for the want you use 'by id program' but that seems incorrect since the first observation for ID has program c and the next has program a. Please revisit your code snippets and revise them so that 'have' and 'want' have the columns you need and rethink the sort condition.
cdubs
Quartz | Level 8

Thank you! Apologies, I was trying to create a model dataset. What I actually "have" is what is displayed above MINUS the "episode" and "Dif_days" columns. 

 

And regarding the sorting, I was manually populating an example so accidentally did have c next to a program a! So in the dataset I am working with that is actually sorted. My apologies! 

 

 

FreelanceReinh
Jade | Level 19

Hello @cdubs,

 

Assuming a slightly modified HAVE dataset ...

data have;
input ID $ program $ admit :date. discharge :date.;
format admit discharge date9.;
cards;
1 a 31DEC2017 31DEC2017
1 a 31DEC2017 01JAN2018
1 a 16APR2020 16APR2020
1 a 16APR2020 16APR2020
1 b 28MAR2020 28MAR2020
1 b 28MAR2020 31MAR2020
1 b 29MAR2020 29MAR2020
1 b 30MAR2020 30MAR2020
2 c 16JUN2015 16JUN2016
2 d 03JUL2018 03JUL2018
2 d 06JUL2018 06JUL2018
;

 ... and that the only purpose of variable dif_days would be to compute episode, I suggest this:

 

data want(drop=prev_dis);
set have;
by id program;
prev_dis=lag(discharge);
if first.id then episode=1;
else if first.program | admit-prev_dis>1 then episode+1;
run;

proc summary data=want;
by id program episode;
var admit discharge;
output out=final(drop=_:) min(admit)=admit_final max(discharge)=discharge_final;
run;

 

 


@cdubs wrote:
	data want;
		set have;
		by id program; 
		format dif_days episode 8.;
		dif_days = admit- lag(discharge);
	        retain episode;
		if first.id then episode=1; 
		   else if id ne lag(id) then episode+1; 
	           else if dif_days >1 then episode+1;
		   else episode = lag(episode);
         run;

and what I get is this: 

data have;
  input ID $ program $ admit $ discharge $ dif_days $ episode;
cards;
1 a 31DEC2017 31DEC2017 . 1
1 a 31DEC2017 01JAN2018 0 2 
1 a 16APR2020 16APR2020 836 3
1 a 16APR2020 16APR2020 0 .
1 b 28MAR2020 28MAR2020 -19 3 
1 b 28MAR2020 31MAR2020 0 . 
1 b 29MAR2020 29MAR2020 -2 3
1 b 30MAR2020 30MAR2020 1 . 
2 c 16JUN2015 16JUN2016 -1766 1 
2 a 03UL2018 03JUL2018 1113 2 
2 a 06JUL2018 06JUL2018 3 3
;run;

I'm not sure why the second row is getting a 2 (none of the episode + 1 conditions are met?)  


This is because lag(id) is evaluated in the second observation for the first time (the ELSE condition was not met in the first observation!). It returns a missing value, which is different from the current id value, hence episode is incremented. Note the unconditional use of the LAG function in my suggested DATA step.

Kurt_Bremser
Super User

Your code can't work at all, as you can't make calculations with those strings that look like dates. After converting the strings to dates, do this:

data want;
set have;
by id program;
retain episode;
if first.id then episode = 0;
if first.program or admit - lag(discharge) > 0 then episode + 1;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 502 views
  • 2 likes
  • 5 in conversation