BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jpsmith
Fluorite | Level 6

Hello,

 

I am working with longitudinal data with several events of interest and would like to calculate the average duration of each event, with recurrence of the event being possible. I've tried a number of things but have spent several days being unsuccessful.

 

An example of the data set that I have is below, where study_id is the unique participant identifier, visit date is the date of the recurring visit, and event1 and event2 are the events of interest.

study_idvisitdateevent1event2
10011-Jan01
100117-Jan01
100112-Feb10
100126-Feb10
10014-Mar10
100118-Mar00
10021-Jan00
100219-Jan10
100230-Jan10
10027-Feb01
100219-Feb01
10024-Mar01
10031-Jan10
100322-Jan10
100310-Feb10
100322-Feb00
10033-Mar01
100319-Mar11
10032-Apr11
10039-Apr10
100327-Apr11
10038-May01

 

The data set I want is below, where e1_count is the number of events per study ID, e1_tot_dur is the total duration of all events (i.e. the sum of days of discreet events), and e1_ave_dur is the average duration of each event (variable of interest).

Study IDe1_counte1_tot_dure1_ave_dure2_counte2_tot_dure2_ave_dur
10011202011616
10021111112525
100327939.524120.5

 

Any advice would be much appreciated!

Jonathan 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Simpler to do this with a long data organisation which can accomodate any number of events:

 

data have;
input study_id vd$ event1 event2;
visitDate = input(cats(vd,"-2017"), anydtdte11.);
format visitDate yymmdd10.;
drop vd;
datalines; 
1001 1-Jan 0 1 
1001 17-Jan 0 1 
1001 12-Feb 1 0 
1001 26-Feb 1 0 
1001 4-Mar 1 0 
1001 18-Mar 0 0 
1002 1-Jan 0 0 
1002 19-Jan 1 0 
1002 30-Jan 1 0 
1002 7-Feb 0 1 
1002 19-Feb 0 1 
1002 4-Mar 0 1 
1003 1-Jan 1 0 
1003 22-Jan 1 0 
1003 10-Feb 1 0 
1003 22-Feb 0 0 
1003 3-Mar 0 1 
1003 19-Mar 1 1 
1003 2-Apr 1 1 
1003 9-Apr 1 0 
1003 27-Apr 1 1 
1003 8-May 0 1 
;

data events;
set have(rename=event1=event) have(rename=event2=event in=in2);
eventNo = 1 + in2;
drop event1 event2;
run;


data want;
tot_dur = 0;
do until(last.study_id);
    lastVisit = visitDate;
    set events; by eventNo study_id;
    if in then 
        if event then 
            if lastVisit > 0 then 
                tot_dur = sum(tot_dur, intck("day", lastVisit, visitDate));
            else;
        else in = 0;
    else
        if event then do;
            in = 1;
            count = sum(count, 1);
            end;
    end;
if count > 0 then ave_dur = tot_dur / count;
keep eventNo study_id tot_dur count ave_dur;
run;
            
proc print noobs; var eventNo study_id tot_dur count ave_dur; run;

 

 

PG

View solution in original post

3 REPLIES 3
PGStats
Opal | Level 21

Simpler to do this with a long data organisation which can accomodate any number of events:

 

data have;
input study_id vd$ event1 event2;
visitDate = input(cats(vd,"-2017"), anydtdte11.);
format visitDate yymmdd10.;
drop vd;
datalines; 
1001 1-Jan 0 1 
1001 17-Jan 0 1 
1001 12-Feb 1 0 
1001 26-Feb 1 0 
1001 4-Mar 1 0 
1001 18-Mar 0 0 
1002 1-Jan 0 0 
1002 19-Jan 1 0 
1002 30-Jan 1 0 
1002 7-Feb 0 1 
1002 19-Feb 0 1 
1002 4-Mar 0 1 
1003 1-Jan 1 0 
1003 22-Jan 1 0 
1003 10-Feb 1 0 
1003 22-Feb 0 0 
1003 3-Mar 0 1 
1003 19-Mar 1 1 
1003 2-Apr 1 1 
1003 9-Apr 1 0 
1003 27-Apr 1 1 
1003 8-May 0 1 
;

data events;
set have(rename=event1=event) have(rename=event2=event in=in2);
eventNo = 1 + in2;
drop event1 event2;
run;


data want;
tot_dur = 0;
do until(last.study_id);
    lastVisit = visitDate;
    set events; by eventNo study_id;
    if in then 
        if event then 
            if lastVisit > 0 then 
                tot_dur = sum(tot_dur, intck("day", lastVisit, visitDate));
            else;
        else in = 0;
    else
        if event then do;
            in = 1;
            count = sum(count, 1);
            end;
    end;
if count > 0 then ave_dur = tot_dur / count;
keep eventNo study_id tot_dur count ave_dur;
run;
            
proc print noobs; var eventNo study_id tot_dur count ave_dur; run;

 

 

PG
jpsmith
Fluorite | Level 6

Incredible! Thank you so much for your assistance. You have no idea how much this has helped.

Ksharp
Super User
data have;
input study_id vd$ event1 event2;
visitDate = input(cats(vd,"-2017"), anydtdte11.);
format visitDate yymmdd10.;
drop vd;
datalines; 
1001 1-Jan 0 1 
1001 17-Jan 0 1 
1001 12-Feb 1 0 
1001 26-Feb 1 0 
1001 4-Mar 1 0 
1001 18-Mar 0 0 
1002 1-Jan 0 0 
1002 19-Jan 1 0 
1002 30-Jan 1 0 
1002 7-Feb 0 1 
1002 19-Feb 0 1 
1002 4-Mar 0 1 
1003 1-Jan 1 0 
1003 22-Jan 1 0 
1003 10-Feb 1 0 
1003 22-Feb 0 0 
1003 3-Mar 0 1 
1003 19-Mar 1 1 
1003 2-Apr 1 1 
1003 9-Apr 1 0 
1003 27-Apr 1 1 
1003 8-May 0 1 
;

data temp;
 set have;
 by study_id  event1 notsorted;
 group+first.event1;
 if event1=1;
run;
proc sql;
create table want1 as
 select  study_id,count(distinct group) as e1_count,sum(range) as e1_tot_dur,
  calculated e1_tot_dur/calculated e1_count as e1_ave_dur
  from (select study_id,group,range(visitDate) as range from temp group by study_id,group)
   group by study_id;
quit;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1909 views
  • 1 like
  • 3 in conversation