BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DavidsM
Calcite | Level 5

 

 

Hello community members,

 

I am working with longitudinal data with several events of interest and would like to calculate the duration between recurrent events, with recurrence of the event being possible several times per individual. Although some individuals have many recurrent events, the standard number of days must elapse before a next event can be considered a true recurrent event (i.e. some events are just continuation of a previous event if they occur within 30 days after the previous one and I want to turn them into non-events in my data. I have calculated the time that has elapsed between events but since some individuals have many non-events and also many events with different intervals in between, It has become hard for me get the exact data I want for the recurrent variable (Status).

 

I've tried a number of expressions that can bring out my desired variable but not yet successful.

 

An example of the data set that I have is below, where study_id is the unique participant identifier, Startdate is the date of visit, and event1 is the events of interest.

 

 

study_id

Startdate

Event1

1001

1-Jan

0

1001

17-Jan

1

1001

17-Feb

1

1001

26-Feb

1

1001

4-Mar

1

1001

18-Mar

0

1001

31-Dec

1

1001

31-Jan

1

1001

4-Feb

0

1001

24-April

1

 

 

 

 

The variable I want for recurrence is as follows;

 

I created the variable StopDate using the StartDate variable (i.e using Date of visit data) with purpose of calculating the time interval between the visits (Variable TimeDiff below in the table is interval between visits). The problem is that the time difference must be at least 30 days from a given observation (event=1) to another (event=1) in order for the second to be a recurrent event. As you see below the 2nd and 3rd events occur before 30 days elapse (i.e occur at 9 days and 8 days, respectively) and are therefore not considered as recurrent but continuation of the first one. But event on 4th March occurs after more than 40 days from the first event and I consider it as a true recurrent event. The study ends on 31-Dec for all individuals.

See desired Table below.

 

 

 

study_id

Startdate

StopDate

Event    TimeDiff     Recurrence

1001

1-Jan

17-Jan

0             17              0

1001

17-Jan

17-Feb

1             30              1

1001

17-Feb

26-Feb

1             9                0

1001

26-Feb

4-Mar

1             8                0

1001

4-Mar

18-Mar

1            14               1

1001

18-Mar

31-Dec

0           305              0

1001

31-Dec

31-Jan

1           30                1

1001

31-Jan

4-Feb

1            4                 0

1001

4-Feb

24-April

0            20               0

1001

24-April

31-Dec

1            365               1

 

I request to be assisted on how to deal with those events occurring after a cumulative period of more than 40 days from the previous (event=1) so that I can selectively turn them into (Recurrence=1) while keeping both (events=1 and Event=0) occurring before 30 days elapse as (Recurrent=0).

 

I will be glad to receive your assistance.

 

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @DavidsM,

 

You're welcome. I'm happy that I could help you.

 

The solution disregarding the extra rule about the last event is exactly that in my earlier post. To check this assertion, you can compare the result of that shorter data step (using a large input dataset HAVE) with the result of the extended data step, but with one line commented out (and, of course, a different output dataset name, e.g. WANT0):

  *if date=evdate_last & event then new_event=1;

PROC COMPARE should then confirm the equality of WANT and WANT0.

 

Please note that the shorter data step is more efficient than the extended one because for the extra rule (whether applied or not, i.e. with the above IF/THEN statement active or commented out) I used a second SET statement, i.e., dataset HAVE is read twice.

View solution in original post

16 REPLIES 16
ballardw
Super User

Are those dates actual SAS date values?

Also what would you expect the resulting data set to look like?

And by "The study ends on 31-Dec for all individuals" do you mean within a calendar year? I ask because the data shown seems to strongly imply that you have multiple years of data when a start date is "31 Dec" and end date is "31 Jan". I also wonder how you are calculating the TIMEDIFF as generally we get 31 days between 31 Dec and the following 31 Jan, and $ Feb to 24 Apr would be more like 79 than 20.

DavidsM
Calcite | Level 5

Hello ballardw;

 

Thank you for your comments;

 

I had not used the real dates in the data but I have attached the word document containing screenshots of the variables I have so far created and their description as well as the kind of variable I want to create.

 

 

I will be glad for your assistance.

 

Thanks in advance.

FreelanceReinh
Jade | Level 19

@DavidsM wrote:

 

study_id

Startdate

StopDate

Event    TimeDiff     Recurrence

1001

1-Jan

17-Jan

0             17              0

1001

17-Jan

17-Feb

1             30              1

1001

17-Feb

26-Feb

1             9                0

 

Just to add: These first three rows alone suggest three different formulas for TimeDiff.

DavidsM
Calcite | Level 5

 

 

Thank you for your comments;

 

I have attached a word document with screenshots of the variables I have so far created and their description as well as the kind of variable I want to create.

 

 

I will be glad for your assistance.

FreelanceReinh
Jade | Level 19

Hello @DavidsM,

 

Thanks for the attached material.

 

Let's try to simplify things a bit and focus on events and the dates on which events occur (or start).

 

It's not impossible, but very difficult for SAS to read data from screenshots. So, let's create some test data from scratch: 

/* Create test data */

data have;
call streaminit(2718281);
do id=1 to 2;
  date='01JAN06'd;
  do _n_=1 to 15;
    date+rand('integer',40); /* use "date+ceil(rand('uniform',40));" with older SAS releases */
    event=rand('bern',0.6);
    output;
  end;
end;
format date yymmdd10.;
run;

/* Add variable NEW_EVENT */

data want(drop=evdate);
retain evdate;
set have;
by id date;
if first.id then evdate=-99999;
new_event=event;
if event then do;
  if .<date-evdate<=30 then new_event=0;
  else evdate=date;
end;
run;

The second data step above creates a variable NEW_EVENT, which equals EVENT except in cases where a new event had occurred <=30 days before. In these cases NEW_EVENT is set to 0 (while EVENT=1).

 

Maybe you can apply a similar logic to your real data.

 

Please note that you can create much larger test datasets by increasing the end values of the two DO loops. You can also play around with the parameters of the RAND function calls and the seed value 2718281. Large test datasets can be useful to compare the results of different algorithms (using PROC COMPARE with the corresponding WANT datasets) because they are likely to cover many if not all relevant cases.

DavidsM
Calcite | Level 5

 

Thank you FreelanceReinhard for your assistance,

 

The code has helped me to convert some of the observations into what I want for example (events=1) and time less than 30 days have been converted to (new-event=0) very well.

 

However, one more problem remains. Some observations are following each other with (event=1)  but each having less than 30 days, some with (event=1) like that immediately following the true recurrent event qualifying for conversion into (new_vent=0) because they have total time elapsed less than 30 days from the previous true recurrence while one qualifying to remain as (new_event=1), because it happens after cumulative time of (10+10+10+10=40 days) even though it as an individual observation it has has 10 days.  It was also turned into (new_event=0) because it alone has 10 days BUT since its cumulative days from the first (new_event=1) is 40 days it should be (new-event=1)

 

see an example of this scenario below

 

                             event        Days      new-event

 

                              1                    37               1

                              1                   10                0

                              1                   10                0 

                              0                   10                0

                              1               10            0

                              1                   50               1

                              0                   10               0

                              0                   10               0

                              0                   10               0

                             1                10            0

 

 

The highlighted observations have (event=1) and were also turned into (event=0) because they less than 30 days of duration. But because they recur after more than 40 days from the previously true event i.e (new-event=1), I should convert them also into (new_event=1).

 

In brief the conversion should be based on any of the two; 1) either time elapsed between events or on the duration of that single event in this case the days for each observation.  Or based on days between observations in the column per subject. 

 

 

I really have a problem with those events in between which have shorter duration (less than 30 days but the intervals between them if added eg 10+10+10+10 =40) the fourth one should be turned to (new_event=1) and those (events=1) eg 10+10=20 days) should remain (events=0) till another one follows them to make total elapsed time since the true (new_event=1) more than 30 days.

 

I will be glad for receive your assistance on this issue once more.

 

Thanks in advance.

FreelanceReinh
Jade | Level 19

The algorithm I proposed does not use a concept of event duration. It is basically aimed at point events. An application to extended events (like adverse events in a clinical trial) is possible, but only the onset date would be used, disregarding duration.

 

Therefore, your interpretations of NEW_EVENT involving durations (e.g. "It was also turned into (new_event=0) because it alone has 10 days" or "turned into (event=0) because they less than 30 days of duration") are not supported by the program code.

 

So, let's try and introduce the concept of event duration into the algorithm:

Ideally, each event should have a start date and an end date, i.e. the first and last date the condition defining the event (e.g. headache) was present. These dates would be stored as SAS date values in numeric variables, e.g. STARTDT and ENDDT.

 

Do you have (or can you derive) start and end dates for each event?

 

Can we assume that two events (for the same patient) don't overlap? Otherwise, wouldn't it make sense to merge overlapping and also adjacent events into one?

 

If so, the first three events of a patient would be characterized by six SAS date values:

STARTDT ENDDT NEW_EVENT
  s1      e1      1
  s2      e2      ?
  s3      e3      ?

with s1<=e1<s2<=e2<s3<=e3.

 

Clearly, the first of these three events qualifies for NEW_EVENT=1. Now it should be possible to define the value of NEW_EVENT for the second event in terms of s1, e1, s2 and (perhaps) e2 (e.g. "if s2-e1>30 then new_event=1; else new_event=0;").
Similarly, the value of NEW_EVENT for the third event should be definable in terms of s1, e1, s2, e2, s3 and (perhaps) e3.

 

Only you know the exact criteria. So, please provide these definitions. Then we can move on to the implementation.

DavidsM
Calcite | Level 5

Thank you so much FreelanceReinhard'

 

Yo have described exactly the situation I have. I have copied part of your text to answer the questions you asked.

 

 

 

Do you have (or can you derive) start and end dates for each event? Yes I have derived those StartDate and EndDate variables and they are in SAS dates format.

 

 

Can we assume that two events (for the same patient) don't overlap? Otherwise, wouldn't it make sense to merge overlapping and also adjacent events into one?

 

The events don't overlap for example using the headache example you gave, the person cannot experience the next headache event before the first one ends BUT the second may start immediately after end of the first one like may be after a day. So in this case the expression you gave (s1<=e1<s2<=e2<s3<=e3) is exactly what I have.

 

If so, the first three events of a patient would be characterized by six SAS date values:

STARTDT ENDDT NEW_EVENT
  s1      e1      1
  s2      e2      ?
  s3      e3      ?

with s1<=e1<s2<=e2<s3<=e3.

 

Clearly, the first of these three events qualifies for NEW_EVENT=1. Now it should be possible to define the value of NEW_EVENT for the second event in terms of s1, e1, s2 and (perhaps) e2 (e.g. "if s2-e1>30 then new_event=1; else new_event=0;").
Similarly, the value of NEW_EVENT for the third event should be definable in terms of s1, e1, s2, e2, s3 and (perhaps) e3

 

The first event must always be as you stated above (NEW_EVENT=1) even if the person had only one event in his life. The following events are the ones I want to categorize again  based on the criteria of 30 days apart from each other.

 

Note: The last event should also be (NEW_EVENT=1) because its possible for the StartDate  and EndDate of the last observation to be the same (I.e if the participant is not followed up beyond the last recorded date, we consider that last event as a true event=1) and assume that same date as the Endate according to the date variables. 

 

 

 

Thank you very much FreelanceReinhard for introducing this concept which is exactly what I wanted. 

 

I will glad to receive your assistance.

FreelanceReinh
Jade | Level 19

Thanks @DavidsM for answering some of my questions.

 

You've introduced one extra rule: NEW_EVENT=1 for the last event of each patient. This will be very easy to implement using BY-group processing.

 

Also the first event is settled. Good.

 

So, we're left with the intermediate events of a patient, but once we have defined the rules for the second and third, chances are that they apply to the subsequent events (except the last) as well. My example, "if s2-e1>30 then new_event=1; else new_event=0;", was meant to help you formulate these rules.

 

I can just repeat: Only you know the exact criteria. Please provide the IF conditions in terms of s1, e1, s2 etc. that determine the value of NEW_EVENT for the second and third event (where it is assumed that there are at least four events in total -- so the third is not the last). Then we can move on to the implementation.

DavidsM
Calcite | Level 5

 

Thank you FreelanceReinhard for your reply.

 

I think  ("if s2-e1>30 then new_event=1; else new_event=0;") and the subsequent  ("if s3-e2>30) then new_event=1; else new_event=0;") ,  ("if s4-e3>30 then new_event=1; else new_event=0;"), etc will be the IF conditions required.   However, in case the first (event=1) is followed by for example two observations when the (event=0) , then the third evaluation should be ("if s2-e4>30 then new_event=1; else new_event=0;"). 

 

The IF contrition should evaluate from the EndDate of the previous (new_event=1)  to the  start date of next (event=1)  and ignore  (event=0) but  make them remain as (new_event=0).  That is to say even if the first (new_event=1) is followed by ten observations that have (event=0) and then the Twelfth observation has (event=1), then the IF condition should be able to evaluate ("if s12-e1>30 then new_event=1; else new_event=0;"). 

 

Note: some observations for a given participant can start with (event=0) then in the subsequent observations we encounter that scenario above that am  targeting to work on and convert into appropriate event description. 

 

 

I think that will bring out the conversion of events i desire to have. The relationship between events' EndDate and StartDate  is the main focus.

 

 

Thank you for your continued support.

 

 

 

 

 

FreelanceReinh
Jade | Level 19

Thanks @DavidsM for the clarification.

 

Obviously, observations with EVENT=0 will have NEW_EVENT=0. Moreover, you've confirmed that they are to be ignored in the definition of NEW_EVENT for observations with EVENT=1 (which we call "events"). Actually, the values s1, e1 etc. that I introduced referred to events only.

 

We have to resolve a possible contradiction: For the third event you specify:



"if s3-e2>30) then new_event=1; else new_event=0;"

Later you formulate the general rule



 

The IF contrition should evaluate from the EndDate of the previous (new_event=1)  to the  start date of next (event=1) ...


Now consider a patient with s2-e1<=30 (hence new_event=0 for the second event), s3-e2<=30 and s3-e1>30.

 

The first rule implies new_event=0 for the third event, whereas the second rule (which is a bit vague) seems to suggest new_event=1.

 

How would you decide this case?

 

(Your other suggestion involving "s2-e4>30" is questionable because this condition can never be true, given that always s2<e4.)

 

DavidsM
Calcite | Level 5

Thank you FreelanceReinhard for the response  about the clarification. 

 

I have pasted your text to clarify more about the desired variable,

 

Obviously, observations with EVENT=0 will have NEW_EVENT=0. Moreover, you've confirmed that they are to be ignored in the definition of NEW_EVENT for observations with EVENT=1 (which we call "events"). Actually, the values s1, e1 etc. that I introduced referred to events only.

 

If  the values s1, e1 etc. that you introduced referred to events only and therefore evaluating observations of (event=1) then I think the formula should evaluate ("if s3-e2>30) then new_event=1; else new_event=0;"), Where  the StartDate (s3) pertains to the following (events=1) to be evaluated and the EndDate (e2) pertains to the converted (new_event=1). 

 

In case the evaluation returns (new_event=0) the the following evaluation should disregard it and evaluates the next (event=1) this time again in relation to the last known (new_event=1). This is because all (events=1) that are turned into (new_event=0) are assumed to have been also (event=0) in real settings.

 

 

We want to assume the (event=1) observations that turn into (new_event=0) were actually (event=0) in real sense and therefore our evaluations targeting the observations that turned into (new_event=1) and the subsequent (event=1) that remain (new_event=1), then subsequent (event=1) that (remain new_event=1) etc. Assumption is that those (event=1) that turn into (new_event=0) were actually (event=0), then If the condition can evaluate this it will give the desired output.

 

 

 

We have to resolve a possible contradiction: For the third event you specify:



"if s3-e2>30) then new_event=1; else new_event=0;"

Later you formulate the general rule



 

The IF contrition should evaluate from the EndDate of the previous (new_event=1)  to the  start date of next (event=1) ...

Now consider a patient with s2-e1<=30 (hence new_event=0 for the second event), s3-e2<=30 and s3-e1>30.

 

The first rule implies new_event=0 for the third event, whereas the second rule (which is a bit vague) seems to suggest new_event=1.

 

The second rule may give what I expect to come out basing on your example above, because taking considering this example you have given here below

 

Now consider a patient with s2-e1<=30 (hence new_event=0 for the second event), s3-e2<=30 and (s3-e1>30)

 

This is what i desire for the third event (s3-e1>30)if the second event(s2-e1<=30)turned out to be (new_event=0)

 

In this case the third observation (s3-e1>30) occurs after more than 30 days from the last known (new_event=1) and therefore its a true (new_event=1) unlike the second event. This is what am desiring to get.  And the evaluation should start again from that event afresh to evaluate the following observation in the same manner. Eg

 

If s2-e1<=30 (hence new_event=0 for the second event), then next "If s3-e1<=30 (hence new_event=0 for the third event), then if next evaluation "If s4-e1<=30 (hence new_event=0 for the fourth event), and Assume the following "If s5-  e1>30 (hence new_event=1 for the firth event). Then if the next evaluation "If s6-e5<=30 (hence new_event=0 for the sixth event). This kind of evaluation is what I desire. That is the rule.

 

 

How would you decide this case?

 

(Your other suggestion involving "s2-e4>30" is questionable because this condition can never be true, given that always s2<e4.)

 

The above expression was a typing Error, its impossible. 

 

 

The first rule is likely to encounter many observations in between with less than 30 days and leave them as (new_event=0) Eg assume there are ten (event=1) separated from each other by 20 days which totals to 200 days. The First rule will leave them as (new_event=0) which is the problem am having while the second rule will partition them into may be five episodes.

 

 

I will be glad to receive your assistance for this rule to be implemented. 

FreelanceReinh
Jade | Level 19

Thanks for your decision and the detailed explanations.

 

Then I think we're back at my original solution, amended by the additional rule "NEW_EVENT=1 for the last event of each patient," as we've discussed earlier:

/* Create test data */

data have;
call streaminit(2718281);
do id=1 to 7;
  date='01JAN06'd;
  do _n_=1 to 15;
    date+rand('integer',40);
    event=rand('bern',0.6);
    output;
  end;
end;
format date yymmdd10.;
run;

/* Add variable NEW_EVENT */

data want(drop=evdate:);
do until(last.id);
  set have;
  by id date;
  if event then evdate_last=date;
end;
evdate=-99999;
do until(last.id);
  set have;
  by id date;
  new_event=event;
  if event then do;
    if .<date-evdate<=30 then new_event=0;
    else evdate=date;
  end;
  if date=evdate_last & event then new_event=1;
  output;
end;
run;

In the above test data IDs 3, 4, 5 and 7 are affected by the newly introduced extra rule.

 

Please feel free to extend dataset HAVE (e.g. do id=1 to 1000;) and perform checks of the resulting dataset WANT. If you find an observation where you're not satisfied with the value of NEW_EVENT, simply state the ID and DATE values and I'll have a look at it.

DavidsM
Calcite | Level 5

Thank you 

 

 

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 1565 views
  • 2 likes
  • 3 in conversation