BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi everyone,

 

I have a data file at the person-week level which I am trying to convert to an episode file. 

 

I have some key variables: 

                  1. ID and week for each record

                  2. Flag1 - when this is 1, that should be the start of an episode

                  3. Flag2 - when this is 1, that should be the end of an episode

 

ID       Week          Flag1 (=1 for episode start)               Flag2 (=1 for episode end)

1           1                         0                                                                0

1           2                         1                                                                0

1           3                         0                                                                1

1           4                         1                                                                0

1           5                         0                                                                0

1           6                         0                                                                1

1           7                         0                                                                0

1           8                         0                                                                0

1          9                          1                                                                0

1         10                         0                                                                1

2          1                          1                                                                0

2          2                          0                                                                1

2          3                          0                                                                0

2          4                          0                                                                0

2          5                          1                                                                0

3          1                          1                                                               1

4          1                          0                                                               0

 

My output file should have one record per episode:

ID        episode_counter

1                        1

1                        2     

1                        3

2                        1

2                        2

3                        1

 

There are a couple of factors that complicate this:

               1. If an episode has begun but not concluded, I would like that to be output to the episode file as well (see bold row above)

               2. As for all the other variables (of which, as I mentioned, there are hundreds), they will be non-missing on one and only one row for each episode. There's no way to know which row and different episodes will                           have the values on different rows. I want to output whatever the non-missing values are for each of those variables for each episode, regardless of which row they are on. For example:

 

ID       Week          Flag1 (=1 for episode start)               Flag2 (=1 for episode end)             var1

2          1                         1                                                                 0                                      

2          2                         0                                                                 1                                     b

2          3                         0                                                                0

2          4                         0                                                                0

2          5                        1                                                                 0                                        a

 

I would want var1 = b as the value in the first episode for ID 2, and var1=a for the value in the second episode. 

 

Any help is much appreciated!

 

 

2 REPLIES 2
MarkDawson
SAS Employee

I'm not sure exactly what you wanted for the last part, i.e. var1 values, but the code below certainly seems to give the results for the first part of your question.  It may also help for the var1 part, however I suspect that needs more work with real data.

data sample ;
   infile datalines missover ; 
   input id week flag1 flag2 var1 $ ;
datalines ;
1     1      0    0
1     2      1    0
1     3      0    1
1     4      1    0
1     5      0    0
1     6      0    1
1     7      0    0
1     8      0    0
1     9      1    0
1    10      0    1
2     1      1    0
2     2      0    1
2     3      0    0
2     4      0    0
2     5      1    0
3     1      1    1
4     1      0    0
5     1      1    0    
5     2      0    1   b
5     3      0    0
5     4      0    0
5     5      1    0   a
; run ; 
data results ; 
   set sample ; 
   by id week ;
   if first.id then call missing (episode_counter) ;
   if last.id or flag2 then 
      do ; 
         episode_counter + 1 ;
         output ;
      end ; 
   drop week flag1 flag2 ;
run ;  

Maybe if you can give some more input data and what you expect as the result, we might be able to advise further.  However, I can't say how quickly I'll be able to get back to you.

Please Note:  I did not see any need for flag1 (start of episode), maybe you missed some requirement, or it is simply not needed.

mkeintz
PROC Star

You want to keep all records in which the auxiliary variables are not missing (there is one such record per episode).  But you also want each distinct episode identified:

 

data want;
  set have;
  by id ;
  if first.id then episode=0;
  if flag1=1 then episode+1;  
  if var1^=' ';
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 338 views
  • 0 likes
  • 3 in conversation