BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jpsmith
Fluorite | Level 6

Hello,

 

I am working with longitudinal data with several events of interest and would like to calculate the average duration of each event, with recurrence of the event being possible. I've tried a number of things but have spent several days being unsuccessful.

 

An example of the data set that I have is below, where study_id is the unique participant identifier, visit date is the date of the recurring visit, and event1 and event2 are the events of interest.

study_idvisitdateevent1event2
10011-Jan01
100117-Jan01
100112-Feb10
100126-Feb10
10014-Mar10
100118-Mar00
10021-Jan00
100219-Jan10
100230-Jan10
10027-Feb01
100219-Feb01
10024-Mar01
10031-Jan10
100322-Jan10
100310-Feb10
100322-Feb00
10033-Mar01
100319-Mar11
10032-Apr11
10039-Apr10
100327-Apr11
10038-May01

 

The data set I want is below, where e1_count is the number of events per study ID, e1_tot_dur is the total duration of all events (i.e. the sum of days of discreet events), and e1_ave_dur is the average duration of each event (variable of interest).

Study IDe1_counte1_tot_dure1_ave_dure2_counte2_tot_dure2_ave_dur
10011202011616
10021111112525
100327939.524120.5

 

Any advice would be much appreciated!

Jonathan 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Simpler to do this with a long data organisation which can accomodate any number of events:

 

data have;
input study_id vd$ event1 event2;
visitDate = input(cats(vd,"-2017"), anydtdte11.);
format visitDate yymmdd10.;
drop vd;
datalines; 
1001 1-Jan 0 1 
1001 17-Jan 0 1 
1001 12-Feb 1 0 
1001 26-Feb 1 0 
1001 4-Mar 1 0 
1001 18-Mar 0 0 
1002 1-Jan 0 0 
1002 19-Jan 1 0 
1002 30-Jan 1 0 
1002 7-Feb 0 1 
1002 19-Feb 0 1 
1002 4-Mar 0 1 
1003 1-Jan 1 0 
1003 22-Jan 1 0 
1003 10-Feb 1 0 
1003 22-Feb 0 0 
1003 3-Mar 0 1 
1003 19-Mar 1 1 
1003 2-Apr 1 1 
1003 9-Apr 1 0 
1003 27-Apr 1 1 
1003 8-May 0 1 
;

data events;
set have(rename=event1=event) have(rename=event2=event in=in2);
eventNo = 1 + in2;
drop event1 event2;
run;


data want;
tot_dur = 0;
do until(last.study_id);
    lastVisit = visitDate;
    set events; by eventNo study_id;
    if in then 
        if event then 
            if lastVisit > 0 then 
                tot_dur = sum(tot_dur, intck("day", lastVisit, visitDate));
            else;
        else in = 0;
    else
        if event then do;
            in = 1;
            count = sum(count, 1);
            end;
    end;
if count > 0 then ave_dur = tot_dur / count;
keep eventNo study_id tot_dur count ave_dur;
run;
            
proc print noobs; var eventNo study_id tot_dur count ave_dur; run;

 

 

PG

View solution in original post

3 REPLIES 3
PGStats
Opal | Level 21

Simpler to do this with a long data organisation which can accomodate any number of events:

 

data have;
input study_id vd$ event1 event2;
visitDate = input(cats(vd,"-2017"), anydtdte11.);
format visitDate yymmdd10.;
drop vd;
datalines; 
1001 1-Jan 0 1 
1001 17-Jan 0 1 
1001 12-Feb 1 0 
1001 26-Feb 1 0 
1001 4-Mar 1 0 
1001 18-Mar 0 0 
1002 1-Jan 0 0 
1002 19-Jan 1 0 
1002 30-Jan 1 0 
1002 7-Feb 0 1 
1002 19-Feb 0 1 
1002 4-Mar 0 1 
1003 1-Jan 1 0 
1003 22-Jan 1 0 
1003 10-Feb 1 0 
1003 22-Feb 0 0 
1003 3-Mar 0 1 
1003 19-Mar 1 1 
1003 2-Apr 1 1 
1003 9-Apr 1 0 
1003 27-Apr 1 1 
1003 8-May 0 1 
;

data events;
set have(rename=event1=event) have(rename=event2=event in=in2);
eventNo = 1 + in2;
drop event1 event2;
run;


data want;
tot_dur = 0;
do until(last.study_id);
    lastVisit = visitDate;
    set events; by eventNo study_id;
    if in then 
        if event then 
            if lastVisit > 0 then 
                tot_dur = sum(tot_dur, intck("day", lastVisit, visitDate));
            else;
        else in = 0;
    else
        if event then do;
            in = 1;
            count = sum(count, 1);
            end;
    end;
if count > 0 then ave_dur = tot_dur / count;
keep eventNo study_id tot_dur count ave_dur;
run;
            
proc print noobs; var eventNo study_id tot_dur count ave_dur; run;

 

 

PG
jpsmith
Fluorite | Level 6

Incredible! Thank you so much for your assistance. You have no idea how much this has helped.

Ksharp
Super User
data have;
input study_id vd$ event1 event2;
visitDate = input(cats(vd,"-2017"), anydtdte11.);
format visitDate yymmdd10.;
drop vd;
datalines; 
1001 1-Jan 0 1 
1001 17-Jan 0 1 
1001 12-Feb 1 0 
1001 26-Feb 1 0 
1001 4-Mar 1 0 
1001 18-Mar 0 0 
1002 1-Jan 0 0 
1002 19-Jan 1 0 
1002 30-Jan 1 0 
1002 7-Feb 0 1 
1002 19-Feb 0 1 
1002 4-Mar 0 1 
1003 1-Jan 1 0 
1003 22-Jan 1 0 
1003 10-Feb 1 0 
1003 22-Feb 0 0 
1003 3-Mar 0 1 
1003 19-Mar 1 1 
1003 2-Apr 1 1 
1003 9-Apr 1 0 
1003 27-Apr 1 1 
1003 8-May 0 1 
;

data temp;
 set have;
 by study_id  event1 notsorted;
 group+first.event1;
 if event1=1;
run;
proc sql;
create table want1 as
 select  study_id,count(distinct group) as e1_count,sum(range) as e1_tot_dur,
  calculated e1_tot_dur/calculated e1_count as e1_ave_dur
  from (select study_id,group,range(visitDate) as range from temp group by study_id,group)
   group by study_id;
quit;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 2084 views
  • 1 like
  • 3 in conversation