Dear SAS community,
I am having the following problem:
My dataset has 3 variables: date, sample, episode. Now I want to reduce the dataset so that I only have the episodes when a sample was taken. A definition of an episode is that there are 3 days before and after without an Episode (E=0). So in the following example I only want the data from 05/01-09/01 because this will be my episode for the sample that was taken on 05/01/20. There is one day which is not qualifying for E=1 but that doesnt matter because it is within the episode. I hope it is somehow clear what I mean.
Thank you in advance for your help.
Bianca
data WORK.CLASS;
input date sample episode;
datalines;
010120 0 0
020120 0 0
030120 0 0
040120 0 0
050120 1 1
060120 0 1
070120 0 1
080120 0 0
090120 0 1
100120 0 0
110120 0 0
120120 0 0
130120 0 0
run;
Here I have code that flags an episode and gives it a number. You could write out the episode number into a second dataset whenever sample is true, and merge that back. Or do that in a SQL step.
data have;
input date :ddmmyy8. sample episode;
format date yymmddd10.;
datalines;
010120 0 0
020120 0 0
030120 0 0
040120 0 0
050120 1 1
060120 0 1
070120 0 1
080120 0 0
090120 0 1
100120 0 0
110120 0 0
120120 0 0
130120 0 0
;
data want;
merge
have
have(firstobs=2 keep=episode rename=(episode=_episode_f_1))
have(firstobs=3 keep=episode rename=(episode=_episode_f_2))
;
retain _episode_p_1 _episode_p_2 _episode_p_3;
array past {3} _episode_p_1 _episode_p_2 _episode_p_3 (0,0,0);
array future {2} _episode_f_1--_episode_f_2;
retain flag 0 _episode_number 0;
if flag = 0 and sum(of past{*}) = 0 and episode = 1
then do;
flag = 1;
_episode_number + 1;
end;
if episode = 0 and sum(of future{*}) = 0 then flag = 0;
if flag then episode_number = _episode_number;
output;
past{mod(_n_,3)+1} = episode;
drop _:;
run;
It's not clear to me how the logic works, and its also not clear why 090120 is included based on "definition of an episode is that there are 3 days before and after without an Episode (E=0)". Please explain in more detail.
For simplicity I skipped some columns in my dataset. So basically this is part of a symptom diary and I created episodes out of different Symptoms (e.g. runny nose + cough qualifies for a disaese day (E=1), but just runny nose does not qualify for a disease (E=0)). When the Patient was sick a sample was taken (e.g. nasal swab) and now I want to calculate how long he was sick and how severe. So the definition for 1 episode is that there are 3 days before and after without an E=1. After that it will be considered as new infection and does no longer belong to that sample. Like even if the Patient had 1 day whereby the symptoms were not that severe, it doesnt matter because after that there are 2 more days with severe symptoms counting for that episode.
I still don't see why, given your logic, 090120 is included in the output.
It's also hard to generalize to code that will work in more than this one situation, but you only give us one situation to work with.
Because after 070120 there are not 3 days with E=0 which will define the end of my episode. As long as there is E=1 and there are not 3 days with E=0 it still belongs to 1 episode
Then why is 100120 not requested to be in the output? It seems that it should be in the output given this explanation: "Because after 070120 there are not 3 days with E=0 which will define the end of my episode. As long as there is E=1 and there are not 3 days with E=0 it still belongs to 1 episode". Does the three days with E=0 have to be consecutive days?
Here I have code that flags an episode and gives it a number. You could write out the episode number into a second dataset whenever sample is true, and merge that back. Or do that in a SQL step.
data have;
input date :ddmmyy8. sample episode;
format date yymmddd10.;
datalines;
010120 0 0
020120 0 0
030120 0 0
040120 0 0
050120 1 1
060120 0 1
070120 0 1
080120 0 0
090120 0 1
100120 0 0
110120 0 0
120120 0 0
130120 0 0
;
data want;
merge
have
have(firstobs=2 keep=episode rename=(episode=_episode_f_1))
have(firstobs=3 keep=episode rename=(episode=_episode_f_2))
;
retain _episode_p_1 _episode_p_2 _episode_p_3;
array past {3} _episode_p_1 _episode_p_2 _episode_p_3 (0,0,0);
array future {2} _episode_f_1--_episode_f_2;
retain flag 0 _episode_number 0;
if flag = 0 and sum(of past{*}) = 0 and episode = 1
then do;
flag = 1;
_episode_number + 1;
end;
if episode = 0 and sum(of future{*}) = 0 then flag = 0;
if flag then episode_number = _episode_number;
output;
past{mod(_n_,3)+1} = episode;
drop _:;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.