DATA Step, Macro, Functions and more

Code sequential observations into episodes, characterize episodes

Accepted Solution Solved
Reply
New Contributor ck3
New Contributor
Posts: 4
Accepted Solution

Code sequential observations into episodes, characterize episodes

Hi, in the following data I have the first seven variables on the left from ID to FLAGSPEC. I want to make the last four variables on the right starting with EP_N. The records are sorted by ascending ID and DATE. One row per ID-DATE is permitted. A person does not have multiple rows for the same date.

 

The data are event rows for coding into episodes. People can have more than one episode.

A new episode starts,
(1) when a person's very first data row is encountered, or
(2) when, for the same person, there are >60 days between one row's date and the next row's date.

 
 

IDDATEDIF_DATEA_TYPEFLAGANYFLAGREGFLAGSPECEP_NEP_TYPEEP_TIMEEP_FREQ
106Jan2017.011010403
110Feb201735000010403
115Feb20175000010403
111May20178501102011
118Sep201713001103172
125Sep2017710003172
104Dec20177011014111
202Jul2016.01101172
209Jul2016710001172

 

 

I have these:
ID: identifies people, two here.
DATE: is the service date.
DIF_DATE: is the interval from one date to the next, for the same person.
A_TYPE: is a dichotomous qualitative variable.
FLAGANY: codes an episode's first row. If the episode only has one row then it's also the last row.
FLAGREG: if it's the episode's first row and A_TYPE=0, then FLAGREG=1, otherwise FLAGREG=0.
FLAGSPEC: if it's the episode's first row and A_TYPE=1, then FLAGSPEC=1, otherwise FLAGSPEC=0.

I want to make these:
EP_N: an episode counter. It writes the same number to each episode's row(s); increments on the row when the person's next episode starts (when FLAGANY=1); and resets to 1 when encountering a new person.


EP_TYPE: characterizes the episode as regular or special. If all rows in the episode have A_TYPE=0, then EP_TYPE=0. If any row in the episode has A_TYPE=1, then EP_TYPE=1.

 

EP_TIME: the interval between the date of the episode's first and last rows, in days. If the episode is one row then EP_TIME=1.


EP_FREQ: the number of rows in the episode, written to each episode row.

 

I think that making EP_N is a key step because it would be a BY-group variable with ID. Thanks for any suggestions.


Accepted Solutions
Solution
2 weeks ago
Super User
Posts: 13,064

Re: Code sequential observations into episodes, characterize episodes

Posted in reply to novinosrin

@novinosrin

 

See what happens in your code if you replace

select *,max(A_TYPE=1) as EP_TYPE,ifn(intck('days',min(date),max(date))=0,1,intck('days',min(date),max(date))) as EP_TIME,count(EP_N) as EP_FREQ

with

 

select *,max(A_TYPE=1) as EP_TYPE,Range(date) as EP_TIME,count(EP_N) as EP_FREQ

 

Range is max(value) - min(value) and returns 0 if all the values are the same (one date)

View solution in original post


All Replies
Super User
Posts: 6,542

Re: Code sequential observations into episodes, characterize episodes

You are absolutely on the right path ... create EP_N first, then go back and create the rest.  Here's an easy way.

 

data want;

set have;

by id;

if first.id then ep_n = 1;

else if dif_date > 60 then ep_n + 1;

run;

 

Technically, you could compute DIF_DATE as you go ... it wouldn't need to be part of the incoming data.  But it's easy when DIF_DATE is already there.

New Contributor ck3
New Contributor
Posts: 4

Re: Code sequential observations into episodes, characterize episodes

Posted in reply to Astounding

Thank you!

PROC Star
Posts: 1,338

Re: Code sequential observations into episodes, characterize episodes

data have;
input ID	DATE :date7.	DIF_DATE	A_TYPE	FLAGANY	FLAGREG	FLAGSPEC;
format date date7.;
datalines;
1	6-Jan-17	.	0	1	1	0
1	10-Feb-17	35	0	0	0	0
1	15-Feb-17	5	0	0	0	0
1	11-May-17	85	0	1	1	0
1	18-Sep-17	130	0	1	1	0
1	25-Sep-17	7	1	0	0	0
1	4-Dec-17	70	1	1	0	1
2	2-Jul-16	.	0	1	1	0
2	9-Jul-16	7	1	0	0	0
;

data temp;
if 0 then set have;
EP_N=0;
do until(last.id);
set have;
by id date;
if FLAGANY then EP_N+1;
output;
end;
run;

proc sql;
create table want as
select *,max(A_TYPE=1) as EP_TYPE,ifn(intck('days',min(date),max(date))=0,1,intck('days',min(date),max(date))) as EP_TIME,count(EP_N) as EP_FREQ
from temp
group by id,EP_N;
quit;
Solution
2 weeks ago
Super User
Posts: 13,064

Re: Code sequential observations into episodes, characterize episodes

Posted in reply to novinosrin

@novinosrin

 

See what happens in your code if you replace

select *,max(A_TYPE=1) as EP_TYPE,ifn(intck('days',min(date),max(date))=0,1,intck('days',min(date),max(date))) as EP_TIME,count(EP_N) as EP_FREQ

with

 

select *,max(A_TYPE=1) as EP_TYPE,Range(date) as EP_TIME,count(EP_N) as EP_FREQ

 

Range is max(value) - min(value) and returns 0 if all the values are the same (one date)

PROC Star
Posts: 1,338

Re: Code sequential observations into episodes, characterize episodes

[ Edited ]

@ballardw My oh My, Thank you sir. I envy your striking presence of mind. Jeez, why didn't i think of it. Thanks again for touch class catch. Simply neat......Cheers!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 118 views
  • 0 likes
  • 4 in conversation