BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Bianca1610
Calcite | Level 5

Dear SAS community, 

 

I am having the following problem: 

My dataset has 3 variables: date, sample, episode. Now I want to reduce the dataset so that I only have the episodes when a sample was taken. A definition of an episode is that there are 3 days before and after without an Episode (E=0). So in the following example I only want the data from 05/01-09/01 because this will be my episode for the sample that was taken on 05/01/20. There is one day which is not qualifying for E=1 but that doesnt matter because it is within the episode. I hope it is somehow clear what I mean.

 

Thank you in advance for your help.

Bianca

 

data WORK.CLASS;
input date sample episode;
datalines;
010120 0 0
020120 0 0
030120 0 0
040120 0 0
050120 1 1
060120 0 1
070120 0 1
080120 0 0
090120 0 1
100120 0 0
110120 0 0
120120 0 0
130120 0 0
run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Here I have code that flags an episode and gives it a number. You could write out the episode number into a second dataset whenever sample is true, and merge that back. Or do that in a SQL step.

data have;
input date :ddmmyy8. sample episode;
format date yymmddd10.;
datalines;
010120 0 0
020120 0 0
030120 0 0
040120 0 0
050120 1 1
060120 0 1
070120 0 1
080120 0 0
090120 0 1
100120 0 0
110120 0 0
120120 0 0
130120 0 0
;

data want;
merge
  have
  have(firstobs=2 keep=episode rename=(episode=_episode_f_1))
  have(firstobs=3 keep=episode rename=(episode=_episode_f_2))
;
retain _episode_p_1 _episode_p_2 _episode_p_3;
array past {3} _episode_p_1 _episode_p_2 _episode_p_3 (0,0,0);
array future {2} _episode_f_1--_episode_f_2;
retain flag 0 _episode_number 0;
if flag = 0 and sum(of past{*}) = 0 and episode = 1
then do;
  flag = 1;
  _episode_number + 1;
end;
if episode = 0 and sum(of future{*}) = 0 then flag = 0;
if flag then episode_number = _episode_number;
output;
past{mod(_n_,3)+1} = episode;
drop _:;
run; 

 

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

It's not clear to me how the logic works, and its also not clear why 090120 is included based on "definition of an episode is that there are 3 days before and after without an Episode (E=0)". Please explain in more detail.

--
Paige Miller
Bianca1610
Calcite | Level 5

For simplicity I skipped some columns in my dataset. So basically this is part of a symptom diary and I created episodes out of different Symptoms (e.g. runny nose + cough qualifies for a disaese day (E=1), but just runny nose does not qualify for a disease (E=0)). When the Patient was sick a sample was taken (e.g. nasal swab) and now I want to calculate how long he was sick and how severe. So the definition for 1 episode is that there are 3 days before and after without an E=1. After that it will be considered as new infection and does no longer belong to that sample. Like even if the Patient had 1 day whereby the symptoms were not that severe, it doesnt matter because after that there are 2 more days with severe symptoms counting for that episode. 

PaigeMiller
Diamond | Level 26

I still don't see why, given your logic, 090120 is included in the output.

 

It's also hard to generalize to code that will work in more than this one situation, but you only give us one situation to work with.

--
Paige Miller
Bianca1610
Calcite | Level 5

Because after 070120 there are not 3 days with E=0 which will define the end of my episode. As long as there is E=1 and there are not 3 days with E=0 it still belongs to 1 episode

PaigeMiller
Diamond | Level 26

Then why is 100120 not requested to be in the output? It seems that it should be in the output given this explanation: "Because after 070120 there are not 3 days with E=0 which will define the end of my episode. As long as there is E=1 and there are not 3 days with E=0 it still belongs to 1 episode". Does the three days with E=0 have to be consecutive days?

--
Paige Miller
Kurt_Bremser
Super User

Here I have code that flags an episode and gives it a number. You could write out the episode number into a second dataset whenever sample is true, and merge that back. Or do that in a SQL step.

data have;
input date :ddmmyy8. sample episode;
format date yymmddd10.;
datalines;
010120 0 0
020120 0 0
030120 0 0
040120 0 0
050120 1 1
060120 0 1
070120 0 1
080120 0 0
090120 0 1
100120 0 0
110120 0 0
120120 0 0
130120 0 0
;

data want;
merge
  have
  have(firstobs=2 keep=episode rename=(episode=_episode_f_1))
  have(firstobs=3 keep=episode rename=(episode=_episode_f_2))
;
retain _episode_p_1 _episode_p_2 _episode_p_3;
array past {3} _episode_p_1 _episode_p_2 _episode_p_3 (0,0,0);
array future {2} _episode_f_1--_episode_f_2;
retain flag 0 _episode_number 0;
if flag = 0 and sum(of past{*}) = 0 and episode = 1
then do;
  flag = 1;
  _episode_number + 1;
end;
if episode = 0 and sum(of future{*}) = 0 then flag = 0;
if flag then episode_number = _episode_number;
output;
past{mod(_n_,3)+1} = episode;
drop _:;
run; 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1577 views
  • 1 like
  • 3 in conversation