Solved: Re: Code sequential observations into episodes, characterize episodes

ck3 · Posted 05-02-2018 12:41 PM

Hi, in the following data I have the first seven variables on the left from ID to FLAGSPEC. I want to make the last four variables on the right starting with EP_N. The records are sorted by ascending ID and DATE. One row per ID-DATE is permitted. A person does not have multiple rows for the same date.

The data are event rows for coding into episodes. People can have more than one episode.

A new episode starts,
(1) when a person's very first data row is encountered, or
(2) when, for the same person, there are >60 days between one row's date and the next row's date.

ID	DATE	DIF_DATE	A_TYPE	FLAGANY	FLAGREG	FLAGSPEC	EP_N	EP_TYPE	EP_TIME	EP_FREQ
1	06Jan2017	.	0	1	1	0	1	0	40	3
1	10Feb2017	35	0	0	0	0	1	0	40	3
1	15Feb2017	5	0	0	0	0	1	0	40	3
1	11May2017	85	0	1	1	0	2	0	1	1
1	18Sep2017	130	0	1	1	0	3	1	7	2
1	25Sep2017	7	1	0	0	0	3	1	7	2
1	04Dec2017	70	1	1	0	1	4	1	1	1
2	02Jul2016	.	0	1	1	0	1	1	7	2
2	09Jul2016	7	1	0	0	0	1	1	7	2

I have these:
ID: identifies people, two here.
DATE: is the service date.
DIF_DATE: is the interval from one date to the next, for the same person.
A_TYPE: is a dichotomous qualitative variable.
FLAGANY: codes an episode's first row. If the episode only has one row then it's also the last row.
FLAGREG: if it's the episode's first row and A_TYPE=0, then FLAGREG=1, otherwise FLAGREG=0.
FLAGSPEC: if it's the episode's first row and A_TYPE=1, then FLAGSPEC=1, otherwise FLAGSPEC=0.

I want to make these:
EP_N: an episode counter. It writes the same number to each episode's row(s); increments on the row when the person's next episode starts (when FLAGANY=1); and resets to 1 when encountering a new person.

EP_TYPE: characterizes the episode as regular or special. If all rows in the episode have A_TYPE=0, then EP_TYPE=0. If any row in the episode has A_TYPE=1, then EP_TYPE=1.

EP_TIME: the interval between the date of the episode's first and last rows, in days. If the episode is one row then EP_TIME=1.

EP_FREQ: the number of rows in the episode, written to each episode row.

I think that making EP_N is a key step because it would be a BY-group variable with ID. Thanks for any suggestions.

ballardw · Posted 05-02-2018 02:52 PM

@novinosrin

See what happens in your code if you replace

select *,max(A_TYPE=1) as EP_TYPE,ifn(intck('days',min(date),max(date))=0,1,intck('days',min(date),max(date))) as EP_TIME,count(EP_N) as EP_FREQ

with

select *,max(A_TYPE=1) as EP_TYPE,Range(date) as EP_TIME,count(EP_N) as EP_FREQ

Range is max(value) - min(value) and returns 0 if all the values are the same (one date)

View solution in original post

Astounding · Posted 05-02-2018 01:17 PM

You are absolutely on the right path ... create EP_N first, then go back and create the rest. Here's an easy way.

data want;

set have;

by id;

if first.id then ep_n = 1;

else if dif_date > 60 then ep_n + 1;

run;

Technically, you could compute DIF_DATE as you go ... it wouldn't need to be part of the incoming data. But it's easy when DIF_DATE is already there.

ck3 · Posted 05-07-2018 10:31 AM

Thank you!

novinosrin · Posted 05-02-2018 02:14 PM

data have;
input ID	DATE :date7.	DIF_DATE	A_TYPE	FLAGANY	FLAGREG	FLAGSPEC;
format date date7.;
datalines;
1	6-Jan-17	.	0	1	1	0
1	10-Feb-17	35	0	0	0	0
1	15-Feb-17	5	0	0	0	0
1	11-May-17	85	0	1	1	0
1	18-Sep-17	130	0	1	1	0
1	25-Sep-17	7	1	0	0	0
1	4-Dec-17	70	1	1	0	1
2	2-Jul-16	.	0	1	1	0
2	9-Jul-16	7	1	0	0	0
;

data temp;
if 0 then set have;
EP_N=0;
do until(last.id);
set have;
by id date;
if FLAGANY then EP_N+1;
output;
end;
run;

proc sql;
create table want as
select *,max(A_TYPE=1) as EP_TYPE,ifn(intck('days',min(date),max(date))=0,1,intck('days',min(date),max(date))) as EP_TIME,count(EP_N) as EP_FREQ
from temp
group by id,EP_N;
quit;

ballardw · Posted 05-02-2018 02:52 PM

@novinosrin

See what happens in your code if you replace

select *,max(A_TYPE=1) as EP_TYPE,ifn(intck('days',min(date),max(date))=0,1,intck('days',min(date),max(date))) as EP_TIME,count(EP_N) as EP_FREQ

with

select *,max(A_TYPE=1) as EP_TYPE,Range(date) as EP_TIME,count(EP_N) as EP_FREQ

Range is max(value) - min(value) and returns 0 if all the values are the same (one date)

novinosrin · Posted 05-02-2018 02:56 PM

@ballardw My oh My, Thank you sir. I envy your striking presence of mind. Jeez, why didn't i think of it. Thanks again for touch class catch. Simply neat......Cheers!

Code sequential observations into episodes, characterize episodes

Re: Code sequential observations into episodes, characterize episodes

Re: Code sequential observations into episodes, characterize episodes

Re: Code sequential observations into episodes, characterize episodes

Re: Code sequential observations into episodes, characterize episodes

Re: Code sequential observations into episodes, characterize episodes

Re: Code sequential observations into episodes, characterize episodes

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away