Hello,
I am working with longitudinal data with several events of interest and would like to calculate the average duration of each event, with recurrence of the event being possible. I've tried a number of things but have spent several days being unsuccessful.
An example of the data set that I have is below, where study_id is the unique participant identifier, visit date is the date of the recurring visit, and event1 and event2 are the events of interest.
study_id | visitdate | event1 | event2 |
1001 | 1-Jan | 0 | 1 |
1001 | 17-Jan | 0 | 1 |
1001 | 12-Feb | 1 | 0 |
1001 | 26-Feb | 1 | 0 |
1001 | 4-Mar | 1 | 0 |
1001 | 18-Mar | 0 | 0 |
1002 | 1-Jan | 0 | 0 |
1002 | 19-Jan | 1 | 0 |
1002 | 30-Jan | 1 | 0 |
1002 | 7-Feb | 0 | 1 |
1002 | 19-Feb | 0 | 1 |
1002 | 4-Mar | 0 | 1 |
1003 | 1-Jan | 1 | 0 |
1003 | 22-Jan | 1 | 0 |
1003 | 10-Feb | 1 | 0 |
1003 | 22-Feb | 0 | 0 |
1003 | 3-Mar | 0 | 1 |
1003 | 19-Mar | 1 | 1 |
1003 | 2-Apr | 1 | 1 |
1003 | 9-Apr | 1 | 0 |
1003 | 27-Apr | 1 | 1 |
1003 | 8-May | 0 | 1 |
The data set I want is below, where e1_count is the number of events per study ID, e1_tot_dur is the total duration of all events (i.e. the sum of days of discreet events), and e1_ave_dur is the average duration of each event (variable of interest).
Study ID | e1_count | e1_tot_dur | e1_ave_dur | e2_count | e2_tot_dur | e2_ave_dur |
1001 | 1 | 20 | 20 | 1 | 16 | 16 |
1002 | 1 | 11 | 11 | 1 | 25 | 25 |
1003 | 2 | 79 | 39.5 | 2 | 41 | 20.5 |
Any advice would be much appreciated!
Jonathan
Simpler to do this with a long data organisation which can accomodate any number of events:
data have;
input study_id vd$ event1 event2;
visitDate = input(cats(vd,"-2017"), anydtdte11.);
format visitDate yymmdd10.;
drop vd;
datalines;
1001 1-Jan 0 1
1001 17-Jan 0 1
1001 12-Feb 1 0
1001 26-Feb 1 0
1001 4-Mar 1 0
1001 18-Mar 0 0
1002 1-Jan 0 0
1002 19-Jan 1 0
1002 30-Jan 1 0
1002 7-Feb 0 1
1002 19-Feb 0 1
1002 4-Mar 0 1
1003 1-Jan 1 0
1003 22-Jan 1 0
1003 10-Feb 1 0
1003 22-Feb 0 0
1003 3-Mar 0 1
1003 19-Mar 1 1
1003 2-Apr 1 1
1003 9-Apr 1 0
1003 27-Apr 1 1
1003 8-May 0 1
;
data events;
set have(rename=event1=event) have(rename=event2=event in=in2);
eventNo = 1 + in2;
drop event1 event2;
run;
data want;
tot_dur = 0;
do until(last.study_id);
lastVisit = visitDate;
set events; by eventNo study_id;
if in then
if event then
if lastVisit > 0 then
tot_dur = sum(tot_dur, intck("day", lastVisit, visitDate));
else;
else in = 0;
else
if event then do;
in = 1;
count = sum(count, 1);
end;
end;
if count > 0 then ave_dur = tot_dur / count;
keep eventNo study_id tot_dur count ave_dur;
run;
proc print noobs; var eventNo study_id tot_dur count ave_dur; run;
Simpler to do this with a long data organisation which can accomodate any number of events:
data have;
input study_id vd$ event1 event2;
visitDate = input(cats(vd,"-2017"), anydtdte11.);
format visitDate yymmdd10.;
drop vd;
datalines;
1001 1-Jan 0 1
1001 17-Jan 0 1
1001 12-Feb 1 0
1001 26-Feb 1 0
1001 4-Mar 1 0
1001 18-Mar 0 0
1002 1-Jan 0 0
1002 19-Jan 1 0
1002 30-Jan 1 0
1002 7-Feb 0 1
1002 19-Feb 0 1
1002 4-Mar 0 1
1003 1-Jan 1 0
1003 22-Jan 1 0
1003 10-Feb 1 0
1003 22-Feb 0 0
1003 3-Mar 0 1
1003 19-Mar 1 1
1003 2-Apr 1 1
1003 9-Apr 1 0
1003 27-Apr 1 1
1003 8-May 0 1
;
data events;
set have(rename=event1=event) have(rename=event2=event in=in2);
eventNo = 1 + in2;
drop event1 event2;
run;
data want;
tot_dur = 0;
do until(last.study_id);
lastVisit = visitDate;
set events; by eventNo study_id;
if in then
if event then
if lastVisit > 0 then
tot_dur = sum(tot_dur, intck("day", lastVisit, visitDate));
else;
else in = 0;
else
if event then do;
in = 1;
count = sum(count, 1);
end;
end;
if count > 0 then ave_dur = tot_dur / count;
keep eventNo study_id tot_dur count ave_dur;
run;
proc print noobs; var eventNo study_id tot_dur count ave_dur; run;
Incredible! Thank you so much for your assistance. You have no idea how much this has helped.
data have;
input study_id vd$ event1 event2;
visitDate = input(cats(vd,"-2017"), anydtdte11.);
format visitDate yymmdd10.;
drop vd;
datalines;
1001 1-Jan 0 1
1001 17-Jan 0 1
1001 12-Feb 1 0
1001 26-Feb 1 0
1001 4-Mar 1 0
1001 18-Mar 0 0
1002 1-Jan 0 0
1002 19-Jan 1 0
1002 30-Jan 1 0
1002 7-Feb 0 1
1002 19-Feb 0 1
1002 4-Mar 0 1
1003 1-Jan 1 0
1003 22-Jan 1 0
1003 10-Feb 1 0
1003 22-Feb 0 0
1003 3-Mar 0 1
1003 19-Mar 1 1
1003 2-Apr 1 1
1003 9-Apr 1 0
1003 27-Apr 1 1
1003 8-May 0 1
;
data temp;
set have;
by study_id event1 notsorted;
group+first.event1;
if event1=1;
run;
proc sql;
create table want1 as
select study_id,count(distinct group) as e1_count,sum(range) as e1_tot_dur,
calculated e1_tot_dur/calculated e1_count as e1_ave_dur
from (select study_id,group,range(visitDate) as range from temp group by study_id,group)
group by study_id;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.