Hello,
I am working with longitudinal data with several events of interest and would like to calculate the average duration of each event, with recurrence of the event being possible. I've tried a number of things but have spent several days being unsuccessful.
An example of the data set that I have is below, where study_id is the unique participant identifier, visit date is the date of the recurring visit, and event1 and event2 are the events of interest.
study_id | visitdate | event1 | event2 |
1001 | 1-Jan | 0 | 1 |
1001 | 17-Jan | 0 | 1 |
1001 | 12-Feb | 1 | 0 |
1001 | 26-Feb | 1 | 0 |
1001 | 4-Mar | 1 | 0 |
1001 | 18-Mar | 0 | 0 |
1002 | 1-Jan | 0 | 0 |
1002 | 19-Jan | 1 | 0 |
1002 | 30-Jan | 1 | 0 |
1002 | 7-Feb | 0 | 1 |
1002 | 19-Feb | 0 | 1 |
1002 | 4-Mar | 0 | 1 |
1003 | 1-Jan | 1 | 0 |
1003 | 22-Jan | 1 | 0 |
1003 | 10-Feb | 1 | 0 |
1003 | 22-Feb | 0 | 0 |
1003 | 3-Mar | 0 | 1 |
1003 | 19-Mar | 1 | 1 |
1003 | 2-Apr | 1 | 1 |
1003 | 9-Apr | 1 | 0 |
1003 | 27-Apr | 1 | 1 |
1003 | 8-May | 0 | 1 |
The data set I want is below, where e1_count is the number of events per study ID, e1_tot_dur is the total duration of all events (i.e. the sum of days of discreet events), and e1_ave_dur is the average duration of each event (variable of interest).
Study ID | e1_count | e1_tot_dur | e1_ave_dur | e2_count | e2_tot_dur | e2_ave_dur |
1001 | 1 | 20 | 20 | 1 | 16 | 16 |
1002 | 1 | 11 | 11 | 1 | 25 | 25 |
1003 | 2 | 79 | 39.5 | 2 | 41 | 20.5 |
Any advice would be much appreciated!
Jonathan
Simpler to do this with a long data organisation which can accomodate any number of events:
data have;
input study_id vd$ event1 event2;
visitDate = input(cats(vd,"-2017"), anydtdte11.);
format visitDate yymmdd10.;
drop vd;
datalines;
1001 1-Jan 0 1
1001 17-Jan 0 1
1001 12-Feb 1 0
1001 26-Feb 1 0
1001 4-Mar 1 0
1001 18-Mar 0 0
1002 1-Jan 0 0
1002 19-Jan 1 0
1002 30-Jan 1 0
1002 7-Feb 0 1
1002 19-Feb 0 1
1002 4-Mar 0 1
1003 1-Jan 1 0
1003 22-Jan 1 0
1003 10-Feb 1 0
1003 22-Feb 0 0
1003 3-Mar 0 1
1003 19-Mar 1 1
1003 2-Apr 1 1
1003 9-Apr 1 0
1003 27-Apr 1 1
1003 8-May 0 1
;
data events;
set have(rename=event1=event) have(rename=event2=event in=in2);
eventNo = 1 + in2;
drop event1 event2;
run;
data want;
tot_dur = 0;
do until(last.study_id);
lastVisit = visitDate;
set events; by eventNo study_id;
if in then
if event then
if lastVisit > 0 then
tot_dur = sum(tot_dur, intck("day", lastVisit, visitDate));
else;
else in = 0;
else
if event then do;
in = 1;
count = sum(count, 1);
end;
end;
if count > 0 then ave_dur = tot_dur / count;
keep eventNo study_id tot_dur count ave_dur;
run;
proc print noobs; var eventNo study_id tot_dur count ave_dur; run;
Simpler to do this with a long data organisation which can accomodate any number of events:
data have;
input study_id vd$ event1 event2;
visitDate = input(cats(vd,"-2017"), anydtdte11.);
format visitDate yymmdd10.;
drop vd;
datalines;
1001 1-Jan 0 1
1001 17-Jan 0 1
1001 12-Feb 1 0
1001 26-Feb 1 0
1001 4-Mar 1 0
1001 18-Mar 0 0
1002 1-Jan 0 0
1002 19-Jan 1 0
1002 30-Jan 1 0
1002 7-Feb 0 1
1002 19-Feb 0 1
1002 4-Mar 0 1
1003 1-Jan 1 0
1003 22-Jan 1 0
1003 10-Feb 1 0
1003 22-Feb 0 0
1003 3-Mar 0 1
1003 19-Mar 1 1
1003 2-Apr 1 1
1003 9-Apr 1 0
1003 27-Apr 1 1
1003 8-May 0 1
;
data events;
set have(rename=event1=event) have(rename=event2=event in=in2);
eventNo = 1 + in2;
drop event1 event2;
run;
data want;
tot_dur = 0;
do until(last.study_id);
lastVisit = visitDate;
set events; by eventNo study_id;
if in then
if event then
if lastVisit > 0 then
tot_dur = sum(tot_dur, intck("day", lastVisit, visitDate));
else;
else in = 0;
else
if event then do;
in = 1;
count = sum(count, 1);
end;
end;
if count > 0 then ave_dur = tot_dur / count;
keep eventNo study_id tot_dur count ave_dur;
run;
proc print noobs; var eventNo study_id tot_dur count ave_dur; run;
Incredible! Thank you so much for your assistance. You have no idea how much this has helped.
data have;
input study_id vd$ event1 event2;
visitDate = input(cats(vd,"-2017"), anydtdte11.);
format visitDate yymmdd10.;
drop vd;
datalines;
1001 1-Jan 0 1
1001 17-Jan 0 1
1001 12-Feb 1 0
1001 26-Feb 1 0
1001 4-Mar 1 0
1001 18-Mar 0 0
1002 1-Jan 0 0
1002 19-Jan 1 0
1002 30-Jan 1 0
1002 7-Feb 0 1
1002 19-Feb 0 1
1002 4-Mar 0 1
1003 1-Jan 1 0
1003 22-Jan 1 0
1003 10-Feb 1 0
1003 22-Feb 0 0
1003 3-Mar 0 1
1003 19-Mar 1 1
1003 2-Apr 1 1
1003 9-Apr 1 0
1003 27-Apr 1 1
1003 8-May 0 1
;
data temp;
set have;
by study_id event1 notsorted;
group+first.event1;
if event1=1;
run;
proc sql;
create table want1 as
select study_id,count(distinct group) as e1_count,sum(range) as e1_tot_dur,
calculated e1_tot_dur/calculated e1_count as e1_ave_dur
from (select study_id,group,range(visitDate) as range from temp group by study_id,group)
group by study_id;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.