BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
hellorc
Obsidian | Level 7

Hello SAS community,

 

I am relatively new to SAS. Currently I am taking survival analysis course, and I am really stuck on reformatting a data into 'counting process' (start, stop) form for fitting a Cox regression model with time-varying covariate (https://support.sas.com/resources/papers/proceedings12/168-2012.pdf). To keep the question short, below please find the much simplified version of the data I have now, and the format I want it to be:

 

 

data have;
input id site $ a1-a5 outcome $@@;
datalines;
1 a 1 1 0 1 1 y
1 b 0 0 1 0 0 n
2 b 1 1 0 0 0 n
2 c 0 0 1 0 1 n
2 d 0 0 0 1 0 n
3 a 1 0 0 1 0 y
;
run;

id: unique identification for each person

 

site: working site

a1-a5: attendance for week 1-5 (1=person works at the site in that week, 0 =otherwise)

outcome: binary outcome of interest (y/n, if the person has y, it means the event happens in his/her last attendance week)

 

There are actual dates for week 1-5, for example:

week 1: 2022/07/31 - 2022/08/06

week 2: 2022/08/07 - 2022/08/13

...

week 5: 2022/08/28 - 2022/09/03

 

Note that not everyone works every week, and some people work at different sites. I am not too sure if it is necessary but I would like to introduce a week indicator variable named 'week', as well as the start and stop variables so that the data format becomes:

data want;
input id site start $ stop $ week outcome @@;
datalines;
1 a 2022/07/31 2022/08/06 1 n
1 a 2022/08/07 2022/08/13 2 n
1 b 2022/08/14 2022/08/20 3 n
1 a 2022/08/21 2022/08/27 4 n
1 a 2022/08/28 2022/09/03 5 y
2 b 2022/07/31 2022/08/06 1 n
2 b 2022/08/07 2022/08/13 2 n
2 c 2022/08/14 2022/08/20 3 n
2 d 2022/08/21 2022/08/27 4 n
2 c 2022/08/28 2022/09/03 5 n
3 a 2022/07/31 2022/08/06 1 n
3 a 2022/08/21 2022/08/27 4 y ; run;

My concern is that since some people stay in the same working site throughout the entire time, some switch sites, some switch sites multiple times, some switch back and forth, some don't show up in some weeks, I just could not find the proper logic that could apply to the data (over 1000 people). Might someone be willing to provide guidance?

 

Thank you,

rc

 

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Here is one way. Note minor change your input as that generated a run time error reading past the end of a data line.

data have;
input id site $ a1-a5 outcome $;
datalines;
1 a 1 1 0 1 1 y
1 b 0 0 1 0 0 n
2 b 1 1 0 0 0 n
2 c 0 0 1 0 1 n
2 d 0 0 0 1 0 n
3 a 1 0 0 1 0 y
;
run;

data want; 
   set have;
   Array a (*) a1-a5;
   Do Week=1 to dim(a);
      if a[week]=1 then do;
         select (week);
            when(1) do; start='31Jul2022'd; end='06Aug2022'd; end;
            when(2) do; start='07Aug2022'd; end='13Aug2022'd; end;
            when(3) do; start='14Aug2022'd; end='20Aug2022'd; end;
            when(4) do; start='21Aug2022'd; end='27Aug2022'd; end;
            when(5) do; start='28Aug2022'd; end='03Sep2022'd; end;
            otherwise put "WARNING: Unexpected week number: " week=;
         end;
         output;
      end;
   end;
   format start end yymmdd10.;
   drop a1-a5;
run;
Proc sort data=want;
  by id start;
run;

It is just asking for way more work to attempt to use date, time or datetime values in character form. So I used date literals above just incase you need to deal with a non-standard "week" duration definition such as might occur at the start/end of a year or some other reason.

Actually the above code could a single first day of the first week date literal value and as long as the offsets are standard could be calculated using the week number and adding days to the first date (which would not be possible with character valued dates) This calculation of offsets would be the way to go with something involving more weeks. Left as an exercise for the interested reader. (Note: the solution is going to shorter than the SELECT code. Note: End=start+7; ).

If you have not seen date literals before they must be in the form of the Date9 (or date7 or similar format) , quotes around ddMONyyyy followed by the d to indicate a date value is intended. The quotes can be single or double as desired, the month abbreviation can be upper, lower or mixed case but you should pick one just to make things easy to read. It is best to use a 4-digit year and 2-digit day of month just so there is no doubt what date is meant when reading the code.

 

Select is a way to write the equivalent of many "if/then do/else do" statements. The value in parentheses on the Select is compared to lists of values in the When clauses and branches to the one with the value of the parameter. If the value of the parameter is not found in the When lists then the Otherwise statement executes.

 

The placement of the OUTPUT statement is critical. If in the wrong place you may not get all the output you want.

 

You should explicitly state the order output requirement. That is why I have a proc sort to match the example provided.

View solution in original post

1 REPLY 1
ballardw
Super User

Here is one way. Note minor change your input as that generated a run time error reading past the end of a data line.

data have;
input id site $ a1-a5 outcome $;
datalines;
1 a 1 1 0 1 1 y
1 b 0 0 1 0 0 n
2 b 1 1 0 0 0 n
2 c 0 0 1 0 1 n
2 d 0 0 0 1 0 n
3 a 1 0 0 1 0 y
;
run;

data want; 
   set have;
   Array a (*) a1-a5;
   Do Week=1 to dim(a);
      if a[week]=1 then do;
         select (week);
            when(1) do; start='31Jul2022'd; end='06Aug2022'd; end;
            when(2) do; start='07Aug2022'd; end='13Aug2022'd; end;
            when(3) do; start='14Aug2022'd; end='20Aug2022'd; end;
            when(4) do; start='21Aug2022'd; end='27Aug2022'd; end;
            when(5) do; start='28Aug2022'd; end='03Sep2022'd; end;
            otherwise put "WARNING: Unexpected week number: " week=;
         end;
         output;
      end;
   end;
   format start end yymmdd10.;
   drop a1-a5;
run;
Proc sort data=want;
  by id start;
run;

It is just asking for way more work to attempt to use date, time or datetime values in character form. So I used date literals above just incase you need to deal with a non-standard "week" duration definition such as might occur at the start/end of a year or some other reason.

Actually the above code could a single first day of the first week date literal value and as long as the offsets are standard could be calculated using the week number and adding days to the first date (which would not be possible with character valued dates) This calculation of offsets would be the way to go with something involving more weeks. Left as an exercise for the interested reader. (Note: the solution is going to shorter than the SELECT code. Note: End=start+7; ).

If you have not seen date literals before they must be in the form of the Date9 (or date7 or similar format) , quotes around ddMONyyyy followed by the d to indicate a date value is intended. The quotes can be single or double as desired, the month abbreviation can be upper, lower or mixed case but you should pick one just to make things easy to read. It is best to use a 4-digit year and 2-digit day of month just so there is no doubt what date is meant when reading the code.

 

Select is a way to write the equivalent of many "if/then do/else do" statements. The value in parentheses on the Select is compared to lists of values in the When clauses and branches to the one with the value of the parameter. If the value of the parameter is not found in the When lists then the Otherwise statement executes.

 

The placement of the OUTPUT statement is critical. If in the wrong place you may not get all the output you want.

 

You should explicitly state the order output requirement. That is why I have a proc sort to match the example provided.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 303 views
  • 2 likes
  • 2 in conversation