Hi, in the following data I have the first seven variables on the left from ID to FLAGSPEC. I want to make the last four variables on the right starting with EP_N. The records are sorted by ascending ID and DATE. One row per ID-DATE is permitted. A person does not have multiple rows for the same date.
The data are event rows for coding into episodes. People can have more than one episode.
A new episode starts,
(1) when a person's very first data row is encountered, or
(2) when, for the same person, there are >60 days between one row's date and the next row's date.
ID | DATE | DIF_DATE | A_TYPE | FLAGANY | FLAGREG | FLAGSPEC | EP_N | EP_TYPE | EP_TIME | EP_FREQ |
1 | 06Jan2017 | . | 0 | 1 | 1 | 0 | 1 | 0 | 40 | 3 |
1 | 10Feb2017 | 35 | 0 | 0 | 0 | 0 | 1 | 0 | 40 | 3 |
1 | 15Feb2017 | 5 | 0 | 0 | 0 | 0 | 1 | 0 | 40 | 3 |
1 | 11May2017 | 85 | 0 | 1 | 1 | 0 | 2 | 0 | 1 | 1 |
1 | 18Sep2017 | 130 | 0 | 1 | 1 | 0 | 3 | 1 | 7 | 2 |
1 | 25Sep2017 | 7 | 1 | 0 | 0 | 0 | 3 | 1 | 7 | 2 |
1 | 04Dec2017 | 70 | 1 | 1 | 0 | 1 | 4 | 1 | 1 | 1 |
2 | 02Jul2016 | . | 0 | 1 | 1 | 0 | 1 | 1 | 7 | 2 |
2 | 09Jul2016 | 7 | 1 | 0 | 0 | 0 | 1 | 1 | 7 | 2 |
I have these:
ID: identifies people, two here.
DATE: is the service date.
DIF_DATE: is the interval from one date to the next, for the same person.
A_TYPE: is a dichotomous qualitative variable.
FLAGANY: codes an episode's first row. If the episode only has one row then it's also the last row.
FLAGREG: if it's the episode's first row and A_TYPE=0, then FLAGREG=1, otherwise FLAGREG=0.
FLAGSPEC: if it's the episode's first row and A_TYPE=1, then FLAGSPEC=1, otherwise FLAGSPEC=0.
I want to make these:
EP_N: an episode counter. It writes the same number to each episode's row(s); increments on the row when the person's next episode starts (when FLAGANY=1); and resets to 1 when encountering a new person.
EP_TYPE: characterizes the episode as regular or special. If all rows in the episode have A_TYPE=0, then EP_TYPE=0. If any row in the episode has A_TYPE=1, then EP_TYPE=1.
EP_TIME: the interval between the date of the episode's first and last rows, in days. If the episode is one row then EP_TIME=1.
EP_FREQ: the number of rows in the episode, written to each episode row.
I think that making EP_N is a key step because it would be a BY-group variable with ID. Thanks for any suggestions.
See what happens in your code if you replace
select *,max(A_TYPE=1) as EP_TYPE,ifn(intck('days',min(date),max(date))=0,1,intck('days',min(date),max(date))) as EP_TIME,count(EP_N) as EP_FREQ
with
select *,max(A_TYPE=1) as EP_TYPE,Range(date) as EP_TIME,count(EP_N) as EP_FREQ
Range is max(value) - min(value) and returns 0 if all the values are the same (one date)
You are absolutely on the right path ... create EP_N first, then go back and create the rest. Here's an easy way.
data want;
set have;
by id;
if first.id then ep_n = 1;
else if dif_date > 60 then ep_n + 1;
run;
Technically, you could compute DIF_DATE as you go ... it wouldn't need to be part of the incoming data. But it's easy when DIF_DATE is already there.
Thank you!
data have;
input ID DATE :date7. DIF_DATE A_TYPE FLAGANY FLAGREG FLAGSPEC;
format date date7.;
datalines;
1 6-Jan-17 . 0 1 1 0
1 10-Feb-17 35 0 0 0 0
1 15-Feb-17 5 0 0 0 0
1 11-May-17 85 0 1 1 0
1 18-Sep-17 130 0 1 1 0
1 25-Sep-17 7 1 0 0 0
1 4-Dec-17 70 1 1 0 1
2 2-Jul-16 . 0 1 1 0
2 9-Jul-16 7 1 0 0 0
;
data temp;
if 0 then set have;
EP_N=0;
do until(last.id);
set have;
by id date;
if FLAGANY then EP_N+1;
output;
end;
run;
proc sql;
create table want as
select *,max(A_TYPE=1) as EP_TYPE,ifn(intck('days',min(date),max(date))=0,1,intck('days',min(date),max(date))) as EP_TIME,count(EP_N) as EP_FREQ
from temp
group by id,EP_N;
quit;
See what happens in your code if you replace
select *,max(A_TYPE=1) as EP_TYPE,ifn(intck('days',min(date),max(date))=0,1,intck('days',min(date),max(date))) as EP_TIME,count(EP_N) as EP_FREQ
with
select *,max(A_TYPE=1) as EP_TYPE,Range(date) as EP_TIME,count(EP_N) as EP_FREQ
Range is max(value) - min(value) and returns 0 if all the values are the same (one date)
@ballardw My oh My, Thank you sir. I envy your striking presence of mind. Jeez, why didn't i think of it. Thanks again for touch class catch. Simply neat......Cheers!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.