Solved: Converting wide to long/weekly format for survival data

hellorc · Posted 11-23-2023 05:14 PM

Hello SAS community, happy Thanksgiving.

I have relatively complicated reformatting question about converting wide to long/weekly format for survival data. Currently I have sample data as follows

data have;
input id gender $ enrolldate : mmddyy. surgery1 : mmddyy. surgery2 : mmddyy. inflam_1 : mmddyy. inflam_2 : mmddyy. inflam_3 : mmddyy. lastfollow : mmddyy. reenroll : mmddyy. @@;
format enrolldate surgery1 surgery2 inflam_1 inflam_2 inflam_3 lastfollow reenroll mmddyy10.;
datalines;
1 M 02/01/2019 02/04/2019 04/16/2019 03/02/2019 04/03/2019 06/04/2019 05/01/2019 06/01/2019
2 M 02/02/2019 05/14/2019 . 06/25/2019 07/14/2019 . 07/25/2019 .
3 F 02/03/2019 . . 10/13/2019 11/13/2019 . . .
;
run;

Study period is the year 2019 (til 12/31/2019). In the wide format data, we have all these dates

enrolldate: enrollment date

surgery1: date of first surgery

surgery2: date of second surgery

inflam_1: date of first inflammation (event)

inflam_2: date of second inflammation (event)

inflam_3: date of third inflammation (event)

lastfollow: patient was lost to followup after this date

reenroll: reenrollment date only for patients who was lost to follow up

As illustration (for id=1), my goal is to reformat the data into long and weekly format such that

data want;
input id gender $ start : mmddyy. stop : mmddyy. week surgery inflam inflam_count @@;
format start stop mmddyy10.;
datalines;
1 M 02/01/2019 02/03/2019 5 0 0 0
1 M 02/03/2019 02/10/2019 6 1 0 0
1 M 02/10/2019 02/17/2019 7 1 0 0
1 M 02/17/2019 02/24/2019 8 1 0 0
1 M 02/24/2019 03/02/2019 9 1 1 1
1 M 03/02/2019 03/03/2019 9 1 0 1
1 M 03/03/2019 03/10/2019 10 1 0 1
1 M 03/10/2019 03/17/2019 11 1 0 1
1 M 03/17/2019 03/24/2019 12 1 0 1
1 M 03/24/2019 03/31/2019 13 1 0 1
1 M 03/31/2019 04/03/2019 14 1 1 2
1 M 04/03/2019 04/07/2019 14 1 0 2
1 M 04/07/2019 04/14/2019 15 1 0 2
1 M 04/14/2019 04/16/2019 16 1 0 2
1 M 04/16/2019 04/21/2019 16 2 0 2
1 M 04/21/2019 04/28/2019 17 2 0 2
1 M 04/28/2019 05/01/2019 18 2 0 2
1 M 06/01/2019 06/02/2019 23 2 0 2
1 M 06/02/2019 06/04/2019 23 2 1 3
1 M 06/04/2019 06/09/2019 23 2 0 3
1 M 06/09/2019 06/16/2019 24 2 0 3
1 M 06/16/2019 06/23/2019 25 2 0 3
1 M 06/23/2019 06/30/2019 26 2 0 3
;
run;
*more weekly rows until 12/31/2019;

The idea is to begin with the enrollment date.
week is the week number of year 2019, begins with Sunday
After surgery date, the 'surgery' group variable will be +1, and it stays unless more surgery was done
'Inflam' is the event indicator
'inflam_count' keeps the ongoing count of inflammation events.
No more record or rows needed after lastfollow date, unless patient had reenroll date
The rows go all the way to 12/31/2019, unless there's lastfollow date without renroll date

I managed to get the surgery correctly, but the event, event count, lastfollow and reenroll keep messing up. Might someone please shed some light on how to do this reformatting? Thank you in advance!

RC

Tom · Posted 11-25-2023 10:50 PM

The first thing to do is transpose the data so you have separate observations for each event you want to count.

data have;
  input id gender $ (enrolldate surgery1 surgery2 inflam_1-inflam_3 lastfollow reenroll) (:mmddyy.);
  format enrolldate surgery1 surgery2 inflam_1-inflam_3 lastfollow reenroll mmddyy10.;
datalines;
1 M 02/01/2019 02/04/2019 04/16/2019 03/02/2019 04/03/2019 06/04/2019 05/01/2019 06/01/2019
2 M 02/02/2019 05/14/2019 . 06/25/2019 07/14/2019 . 07/25/2019 .
3 F 02/03/2019 . . 10/13/2019 11/13/2019 . . .
;

proc transpose data=have out=tall(rename=(col1=date) where=(not missing(date)));
  by id  ;
  var surgery1 surgery2 inflam_1-inflam_3 ;
run;

proc sort;
  by id date;
run;

Which will get you something like this:

Obs    id     _NAME_           date

  1     1    surgery1    02/04/2019
  2     1    inflam_1    03/02/2019
  3     1    inflam_2    04/03/2019
  4     1    surgery2    04/16/2019
  5     1    inflam_3    06/04/2019
  6     2    surgery1    05/14/2019
  7     2    inflam_1    06/25/2019
  8     2    inflam_2    07/14/2019
  9     3    inflam_1    10/13/2019
 10     3    inflam_2    11/13/2019

Now if you blow-up your list of ID's to have one observation per week. Perhaps something like this:

data weeks;
  set have (drop=surgery1 surgery2 inflam_1-inflam_3);
  do date=nwkdom(1,1,1,2019) by 7 to mdy(12,31,2019);
    week=week(date);
    format date mmddyy10.;
    output;
  end;
run;

Then you can interleave the two by ID and DATE and count how many of the different type of events there are. So perhaps you meant something like:

data want;
  set tall weeks;
  by id date ;
  retain surgery inflam inflam_count;
  if first.id then call missing(surgery,inflam_count);
  if first.id then inflam=0;
  surgery + (_name_=:'surgery');
  inflam_count + (_name_=:'inflam');
  if (_name_=:'inflam') then inflam=1;
  if not missing(week) then output;
  drop _name_;
run;

Which will get counts like:

                                                                                                     inflam_
id       date       gender    enrolldate    lastfollow     reenroll     week    surgery    inflam     count

 1    01/06/2019      M       02/01/2019    05/01/2019    06/01/2019      1        0          0         0
 1    01/13/2019      M       02/01/2019    05/01/2019    06/01/2019      2        0          0         0
 1    01/20/2019      M       02/01/2019    05/01/2019    06/01/2019      3        0          0         0
 1    01/27/2019      M       02/01/2019    05/01/2019    06/01/2019      4        0          0         0
 1    02/03/2019      M       02/01/2019    05/01/2019    06/01/2019      5        0          0         0
 1    02/10/2019      M       02/01/2019    05/01/2019    06/01/2019      6        1          0         0
 1    02/17/2019      M       02/01/2019    05/01/2019    06/01/2019      7        1          0         0
 1    02/24/2019      M       02/01/2019    05/01/2019    06/01/2019      8        1          0         0
 1    03/03/2019      M       02/01/2019    05/01/2019    06/01/2019      9        1          1         1
 1    03/10/2019      M       02/01/2019    05/01/2019    06/01/2019     10        1          1         1
 1    03/17/2019      M       02/01/2019    05/01/2019    06/01/2019     11        1          1         1
 1    03/24/2019      M       02/01/2019    05/01/2019    06/01/2019     12        1          1         1
 1    03/31/2019      M       02/01/2019    05/01/2019    06/01/2019     13        1          1         1
 1    04/07/2019      M       02/01/2019    05/01/2019    06/01/2019     14        1          1         2
 1    04/14/2019      M       02/01/2019    05/01/2019    06/01/2019     15        1          1         2
 1    04/21/2019      M       02/01/2019    05/01/2019    06/01/2019     16        2          1         2
 1    04/28/2019      M       02/01/2019    05/01/2019    06/01/2019     17        2          1         2
 1    05/05/2019      M       02/01/2019    05/01/2019    06/01/2019     18        2          1         2
 1    05/12/2019      M       02/01/2019    05/01/2019    06/01/2019     19        2          1         2
 1    05/19/2019      M       02/01/2019    05/01/2019    06/01/2019     20        2          1         2
...

View solution in original post

Patrick · Posted 11-25-2023 06:47 PM

What have you tried so far and where do you get stuck? Please share the code you've created already even if it's not yet fully returning the desired result.

If providing an answer can we trust that you're not going to delete your question again?

Tom · Posted 11-25-2023 10:50 PM

The first thing to do is transpose the data so you have separate observations for each event you want to count.

data have;
  input id gender $ (enrolldate surgery1 surgery2 inflam_1-inflam_3 lastfollow reenroll) (:mmddyy.);
  format enrolldate surgery1 surgery2 inflam_1-inflam_3 lastfollow reenroll mmddyy10.;
datalines;
1 M 02/01/2019 02/04/2019 04/16/2019 03/02/2019 04/03/2019 06/04/2019 05/01/2019 06/01/2019
2 M 02/02/2019 05/14/2019 . 06/25/2019 07/14/2019 . 07/25/2019 .
3 F 02/03/2019 . . 10/13/2019 11/13/2019 . . .
;

proc transpose data=have out=tall(rename=(col1=date) where=(not missing(date)));
  by id  ;
  var surgery1 surgery2 inflam_1-inflam_3 ;
run;

proc sort;
  by id date;
run;

Which will get you something like this:

Obs    id     _NAME_           date

  1     1    surgery1    02/04/2019
  2     1    inflam_1    03/02/2019
  3     1    inflam_2    04/03/2019
  4     1    surgery2    04/16/2019
  5     1    inflam_3    06/04/2019
  6     2    surgery1    05/14/2019
  7     2    inflam_1    06/25/2019
  8     2    inflam_2    07/14/2019
  9     3    inflam_1    10/13/2019
 10     3    inflam_2    11/13/2019

Now if you blow-up your list of ID's to have one observation per week. Perhaps something like this:

data weeks;
  set have (drop=surgery1 surgery2 inflam_1-inflam_3);
  do date=nwkdom(1,1,1,2019) by 7 to mdy(12,31,2019);
    week=week(date);
    format date mmddyy10.;
    output;
  end;
run;

Then you can interleave the two by ID and DATE and count how many of the different type of events there are. So perhaps you meant something like:

data want;
  set tall weeks;
  by id date ;
  retain surgery inflam inflam_count;
  if first.id then call missing(surgery,inflam_count);
  if first.id then inflam=0;
  surgery + (_name_=:'surgery');
  inflam_count + (_name_=:'inflam');
  if (_name_=:'inflam') then inflam=1;
  if not missing(week) then output;
  drop _name_;
run;

Which will get counts like:

                                                                                                     inflam_
id       date       gender    enrolldate    lastfollow     reenroll     week    surgery    inflam     count

 1    01/06/2019      M       02/01/2019    05/01/2019    06/01/2019      1        0          0         0
 1    01/13/2019      M       02/01/2019    05/01/2019    06/01/2019      2        0          0         0
 1    01/20/2019      M       02/01/2019    05/01/2019    06/01/2019      3        0          0         0
 1    01/27/2019      M       02/01/2019    05/01/2019    06/01/2019      4        0          0         0
 1    02/03/2019      M       02/01/2019    05/01/2019    06/01/2019      5        0          0         0
 1    02/10/2019      M       02/01/2019    05/01/2019    06/01/2019      6        1          0         0
 1    02/17/2019      M       02/01/2019    05/01/2019    06/01/2019      7        1          0         0
 1    02/24/2019      M       02/01/2019    05/01/2019    06/01/2019      8        1          0         0
 1    03/03/2019      M       02/01/2019    05/01/2019    06/01/2019      9        1          1         1
 1    03/10/2019      M       02/01/2019    05/01/2019    06/01/2019     10        1          1         1
 1    03/17/2019      M       02/01/2019    05/01/2019    06/01/2019     11        1          1         1
 1    03/24/2019      M       02/01/2019    05/01/2019    06/01/2019     12        1          1         1
 1    03/31/2019      M       02/01/2019    05/01/2019    06/01/2019     13        1          1         1
 1    04/07/2019      M       02/01/2019    05/01/2019    06/01/2019     14        1          1         2
 1    04/14/2019      M       02/01/2019    05/01/2019    06/01/2019     15        1          1         2
 1    04/21/2019      M       02/01/2019    05/01/2019    06/01/2019     16        2          1         2
 1    04/28/2019      M       02/01/2019    05/01/2019    06/01/2019     17        2          1         2
 1    05/05/2019      M       02/01/2019    05/01/2019    06/01/2019     18        2          1         2
 1    05/12/2019      M       02/01/2019    05/01/2019    06/01/2019     19        2          1         2
 1    05/19/2019      M       02/01/2019    05/01/2019    06/01/2019     20        2          1         2
...

Converting wide to long/weekly format for survival data

Re: Converting wide to long/weekly format for survival data

Re: Converting wide to long/weekly format for survival data

Re: Converting wide to long/weekly format for survival data