## flagging a start date when difference between two consecutive dates is greater than certain value

Frequent Contributor
Posts: 78

# flagging a start date when difference between two consecutive dates is greater than certain value

Data have;

Input id \$ date1;

Datalines;

101  20-Oct-2016

101 15-Nov-2016

101 25-Jan-2017

101 6-May-2017

102  10-Oct-2016

102 25-Nov-2016

102 15-Jan-2017

102 26-May-2017

103  27-Oct-2016

103 05-Nov-2016

103 25-Jan-2017

103 16-Feb-2017

;

run;

I want achieve two things:

• Calculate the number of days between consecutive dates (each dates and previous date within each id). For example, the number of days between 15-Nov-2016 and 20-Oct-2016; 25-Jan-2017 and 15-Nov-2016; 6-May-2017 and 25-Jan-2017, etc

• If the number of days is >=46 I want to count number of id that has any difference in date at least 46 days. And the I want to flag the date the counting began. For example, for id=102, the number of days between 26-May-2017 and 15-Jan-2017, is more than 46 days, therefore I want to flag 15-Jan-2017 since that was the date the counting began.
Super User
Posts: 6,622

## Re: flagging a start date when difference between two consecutive dates is greater than certain valu

A slight variation on what you ask turns a difficult problem into an easy one.  Consider this approach and whether it would suit your needs.  The set-up is up to you:  your data set must be sorted by ID and DATE1, and DATE1 must actually be a SAS date and not a character string.  From that point:

data want;

set have;

by id;

prior_date = lag(date);

interval = date - prior_date;

if first.id then do;

interval = .;

prior_date = .;

interval_count = 0;

end;

if interval >= 46 and interval_count=0 then do;

interval_count + 1;

flag = 1;

end;

format prior_date date11.;

drop interval_count;

run;

The difference between what you ask and what the program does:  the flag appears on the SECOND date of the interval you are seeking.  However, the observation contains the variable PRIOR_DATE, which holds the first date of the interval.  So it should support what you need to do.  For your second bullet point, SAS has many ways to count.  This would be one of them:

proc freq data=want;

tables ID;

where flag = 1;

run;

Discussion stats