Hello SAS community, happy Thanksgiving.
I have relatively complicated reformatting question about converting wide to long/weekly format for survival data. Currently I have sample data as follows
data have;
input id gender $ enrolldate : mmddyy. surgery1 : mmddyy. surgery2 : mmddyy. inflam_1 : mmddyy. inflam_2 : mmddyy. inflam_3 : mmddyy. lastfollow : mmddyy. reenroll : mmddyy. @@;
format enrolldate surgery1 surgery2 inflam_1 inflam_2 inflam_3 lastfollow reenroll mmddyy10.;
datalines;
1 M 02/01/2019 02/04/2019 04/16/2019 03/02/2019 04/03/2019 06/04/2019 05/01/2019 06/01/2019
2 M 02/02/2019 05/14/2019 . 06/25/2019 07/14/2019 . 07/25/2019 .
3 F 02/03/2019 . . 10/13/2019 11/13/2019 . . .
;
run;
Study period is the year 2019 (til 12/31/2019). In the wide format data, we have all these dates
enrolldate: enrollment date
surgery1: date of first surgery
surgery2: date of second surgery
inflam_1: date of first inflammation (event)
inflam_2: date of second inflammation (event)
inflam_3: date of third inflammation (event)
lastfollow: patient was lost to followup after this date
reenroll: reenrollment date only for patients who was lost to follow up
As illustration (for id=1), my goal is to reformat the data into long and weekly format such that
data want;
input id gender $ start : mmddyy. stop : mmddyy. week surgery inflam inflam_count @@;
format start stop mmddyy10.;
datalines;
1 M 02/01/2019 02/03/2019 5 0 0 0
1 M 02/03/2019 02/10/2019 6 1 0 0
1 M 02/10/2019 02/17/2019 7 1 0 0
1 M 02/17/2019 02/24/2019 8 1 0 0
1 M 02/24/2019 03/02/2019 9 1 1 1
1 M 03/02/2019 03/03/2019 9 1 0 1
1 M 03/03/2019 03/10/2019 10 1 0 1
1 M 03/10/2019 03/17/2019 11 1 0 1
1 M 03/17/2019 03/24/2019 12 1 0 1
1 M 03/24/2019 03/31/2019 13 1 0 1
1 M 03/31/2019 04/03/2019 14 1 1 2
1 M 04/03/2019 04/07/2019 14 1 0 2
1 M 04/07/2019 04/14/2019 15 1 0 2
1 M 04/14/2019 04/16/2019 16 1 0 2
1 M 04/16/2019 04/21/2019 16 2 0 2
1 M 04/21/2019 04/28/2019 17 2 0 2
1 M 04/28/2019 05/01/2019 18 2 0 2
1 M 06/01/2019 06/02/2019 23 2 0 2
1 M 06/02/2019 06/04/2019 23 2 1 3
1 M 06/04/2019 06/09/2019 23 2 0 3
1 M 06/09/2019 06/16/2019 24 2 0 3
1 M 06/16/2019 06/23/2019 25 2 0 3
1 M 06/23/2019 06/30/2019 26 2 0 3
;
run;
*more weekly rows until 12/31/2019;
I managed to get the surgery correctly, but the event, event count, lastfollow and reenroll keep messing up. Might someone please shed some light on how to do this reformatting? Thank you in advance!
RC
The first thing to do is transpose the data so you have separate observations for each event you want to count.
data have;
input id gender $ (enrolldate surgery1 surgery2 inflam_1-inflam_3 lastfollow reenroll) (:mmddyy.);
format enrolldate surgery1 surgery2 inflam_1-inflam_3 lastfollow reenroll mmddyy10.;
datalines;
1 M 02/01/2019 02/04/2019 04/16/2019 03/02/2019 04/03/2019 06/04/2019 05/01/2019 06/01/2019
2 M 02/02/2019 05/14/2019 . 06/25/2019 07/14/2019 . 07/25/2019 .
3 F 02/03/2019 . . 10/13/2019 11/13/2019 . . .
;
proc transpose data=have out=tall(rename=(col1=date) where=(not missing(date)));
by id ;
var surgery1 surgery2 inflam_1-inflam_3 ;
run;
proc sort;
by id date;
run;
Which will get you something like this:
Obs id _NAME_ date 1 1 surgery1 02/04/2019 2 1 inflam_1 03/02/2019 3 1 inflam_2 04/03/2019 4 1 surgery2 04/16/2019 5 1 inflam_3 06/04/2019 6 2 surgery1 05/14/2019 7 2 inflam_1 06/25/2019 8 2 inflam_2 07/14/2019 9 3 inflam_1 10/13/2019 10 3 inflam_2 11/13/2019
Now if you blow-up your list of ID's to have one observation per week. Perhaps something like this:
data weeks;
set have (drop=surgery1 surgery2 inflam_1-inflam_3);
do date=nwkdom(1,1,1,2019) by 7 to mdy(12,31,2019);
week=week(date);
format date mmddyy10.;
output;
end;
run;
Then you can interleave the two by ID and DATE and count how many of the different type of events there are. So perhaps you meant something like:
data want;
set tall weeks;
by id date ;
retain surgery inflam inflam_count;
if first.id then call missing(surgery,inflam_count);
if first.id then inflam=0;
surgery + (_name_=:'surgery');
inflam_count + (_name_=:'inflam');
if (_name_=:'inflam') then inflam=1;
if not missing(week) then output;
drop _name_;
run;
Which will get counts like:
inflam_ id date gender enrolldate lastfollow reenroll week surgery inflam count 1 01/06/2019 M 02/01/2019 05/01/2019 06/01/2019 1 0 0 0 1 01/13/2019 M 02/01/2019 05/01/2019 06/01/2019 2 0 0 0 1 01/20/2019 M 02/01/2019 05/01/2019 06/01/2019 3 0 0 0 1 01/27/2019 M 02/01/2019 05/01/2019 06/01/2019 4 0 0 0 1 02/03/2019 M 02/01/2019 05/01/2019 06/01/2019 5 0 0 0 1 02/10/2019 M 02/01/2019 05/01/2019 06/01/2019 6 1 0 0 1 02/17/2019 M 02/01/2019 05/01/2019 06/01/2019 7 1 0 0 1 02/24/2019 M 02/01/2019 05/01/2019 06/01/2019 8 1 0 0 1 03/03/2019 M 02/01/2019 05/01/2019 06/01/2019 9 1 1 1 1 03/10/2019 M 02/01/2019 05/01/2019 06/01/2019 10 1 1 1 1 03/17/2019 M 02/01/2019 05/01/2019 06/01/2019 11 1 1 1 1 03/24/2019 M 02/01/2019 05/01/2019 06/01/2019 12 1 1 1 1 03/31/2019 M 02/01/2019 05/01/2019 06/01/2019 13 1 1 1 1 04/07/2019 M 02/01/2019 05/01/2019 06/01/2019 14 1 1 2 1 04/14/2019 M 02/01/2019 05/01/2019 06/01/2019 15 1 1 2 1 04/21/2019 M 02/01/2019 05/01/2019 06/01/2019 16 2 1 2 1 04/28/2019 M 02/01/2019 05/01/2019 06/01/2019 17 2 1 2 1 05/05/2019 M 02/01/2019 05/01/2019 06/01/2019 18 2 1 2 1 05/12/2019 M 02/01/2019 05/01/2019 06/01/2019 19 2 1 2 1 05/19/2019 M 02/01/2019 05/01/2019 06/01/2019 20 2 1 2 ...
What have you tried so far and where do you get stuck? Please share the code you've created already even if it's not yet fully returning the desired result.
If providing an answer can we trust that you're not going to delete your question again?
The first thing to do is transpose the data so you have separate observations for each event you want to count.
data have;
input id gender $ (enrolldate surgery1 surgery2 inflam_1-inflam_3 lastfollow reenroll) (:mmddyy.);
format enrolldate surgery1 surgery2 inflam_1-inflam_3 lastfollow reenroll mmddyy10.;
datalines;
1 M 02/01/2019 02/04/2019 04/16/2019 03/02/2019 04/03/2019 06/04/2019 05/01/2019 06/01/2019
2 M 02/02/2019 05/14/2019 . 06/25/2019 07/14/2019 . 07/25/2019 .
3 F 02/03/2019 . . 10/13/2019 11/13/2019 . . .
;
proc transpose data=have out=tall(rename=(col1=date) where=(not missing(date)));
by id ;
var surgery1 surgery2 inflam_1-inflam_3 ;
run;
proc sort;
by id date;
run;
Which will get you something like this:
Obs id _NAME_ date 1 1 surgery1 02/04/2019 2 1 inflam_1 03/02/2019 3 1 inflam_2 04/03/2019 4 1 surgery2 04/16/2019 5 1 inflam_3 06/04/2019 6 2 surgery1 05/14/2019 7 2 inflam_1 06/25/2019 8 2 inflam_2 07/14/2019 9 3 inflam_1 10/13/2019 10 3 inflam_2 11/13/2019
Now if you blow-up your list of ID's to have one observation per week. Perhaps something like this:
data weeks;
set have (drop=surgery1 surgery2 inflam_1-inflam_3);
do date=nwkdom(1,1,1,2019) by 7 to mdy(12,31,2019);
week=week(date);
format date mmddyy10.;
output;
end;
run;
Then you can interleave the two by ID and DATE and count how many of the different type of events there are. So perhaps you meant something like:
data want;
set tall weeks;
by id date ;
retain surgery inflam inflam_count;
if first.id then call missing(surgery,inflam_count);
if first.id then inflam=0;
surgery + (_name_=:'surgery');
inflam_count + (_name_=:'inflam');
if (_name_=:'inflam') then inflam=1;
if not missing(week) then output;
drop _name_;
run;
Which will get counts like:
inflam_ id date gender enrolldate lastfollow reenroll week surgery inflam count 1 01/06/2019 M 02/01/2019 05/01/2019 06/01/2019 1 0 0 0 1 01/13/2019 M 02/01/2019 05/01/2019 06/01/2019 2 0 0 0 1 01/20/2019 M 02/01/2019 05/01/2019 06/01/2019 3 0 0 0 1 01/27/2019 M 02/01/2019 05/01/2019 06/01/2019 4 0 0 0 1 02/03/2019 M 02/01/2019 05/01/2019 06/01/2019 5 0 0 0 1 02/10/2019 M 02/01/2019 05/01/2019 06/01/2019 6 1 0 0 1 02/17/2019 M 02/01/2019 05/01/2019 06/01/2019 7 1 0 0 1 02/24/2019 M 02/01/2019 05/01/2019 06/01/2019 8 1 0 0 1 03/03/2019 M 02/01/2019 05/01/2019 06/01/2019 9 1 1 1 1 03/10/2019 M 02/01/2019 05/01/2019 06/01/2019 10 1 1 1 1 03/17/2019 M 02/01/2019 05/01/2019 06/01/2019 11 1 1 1 1 03/24/2019 M 02/01/2019 05/01/2019 06/01/2019 12 1 1 1 1 03/31/2019 M 02/01/2019 05/01/2019 06/01/2019 13 1 1 1 1 04/07/2019 M 02/01/2019 05/01/2019 06/01/2019 14 1 1 2 1 04/14/2019 M 02/01/2019 05/01/2019 06/01/2019 15 1 1 2 1 04/21/2019 M 02/01/2019 05/01/2019 06/01/2019 16 2 1 2 1 04/28/2019 M 02/01/2019 05/01/2019 06/01/2019 17 2 1 2 1 05/05/2019 M 02/01/2019 05/01/2019 06/01/2019 18 2 1 2 1 05/12/2019 M 02/01/2019 05/01/2019 06/01/2019 19 2 1 2 1 05/19/2019 M 02/01/2019 05/01/2019 06/01/2019 20 2 1 2 ...
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.