Solved: Re: How to Count Dates within Certain Range

A_SAS_Man · Posted 05-20-2021 01:11 PM

I am trying to look at different lines of data and count how many times a line falls within 30 days of previous lines for the same id and code. I have been trying to do this with a data step but haven't been able to figure it out and am open to any solutions.

Here is my starting data.

data have;
input id code date :date9.;
format date date9.;
datalines;
4350 1 '01JUN2021'd
4350 1 '13JUN2021'd
4456 3 '27JUL2021'd
4456 4 '27NOV2021'd
1234 1 '01JAN2021'd
1234 1 '10JAN2021'd
1234 1 '21JAN2021'd
;
run;

Here is what I'm trying to generate.

data want;
input id code date :date9. count;
format date date9.;
datalines;
4350 1 '01JUN2021'd 0
4350 1 '13JUN2021'd 1
4456 3 '27JUL2021'd 0
4456 4 '27NOV2021'd 0
1234 1 '01JAN2021'd 0
1234 1 '10JAN2021'd 1
1234 1 '21JAN2021'd 2
;
run;

Kurt_Bremser · Posted 05-20-2021 02:02 PM

So if I interpret you right, we need to keep a starting date for the group, and start a new group when we go past 30 days of that:

data want;
set have;
by id notsorted code;
retain start_date;
l_date = lag(date);
if first.code then start_date = date;
if not first.code and date - l_date le 30
then do;
  if date - start_date le 30
  then count + 1;
  else do;
    count = 1;
    start_date = l_date;
  end;
end;
else count = 0;
drop l_date start_date;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 05-20-2021 01:30 PM

In a data step, use BY processing and LAG to compare with the previous value.

data want;
set have;
by id notsorted code;
if not first.code and date - lag(date) le 30
then count + 1;
else count = 0;
run;

The SUM statement (count + 1) implies an automatic RETAIN.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

A_SAS_Man · Posted 05-20-2021 01:48 PM

This is very close, but it seems to "overcount" in some instances. See below modified example to demonstrate.

data have;
input id code date :date9.;
format date date9.;
datalines;
4350 1 '01JUN2021'd
4350 1 '13JUN2021'd
4456 3 '27JUL2021'd
4456 4 '27NOV2021'd
1234 1 '01JAN2021'd
1234 1 '10JAN2021'd
1234 1 '21JAN2021'd
1234 1 '15FEB2021'd
;
run;

Your code would produce this:

data code_output;
input id code date :date9. count;
format date date9.;
datalines;
4350 1 '01JUN2021'd 0
4350 1 '13JUN2021'd 1
4456 3 '27JUL2021'd 0
4456 4 '27NOV2021'd 0
1234 1 '01JAN2021'd 0
1234 1 '10JAN2021'd 1
1234 1 '21JAN2021'd 2
1234 1 '15FEB2021'd 3
;
run;

What I'm trying to get is this:

data want;
input id code date :date9. count;
format date date9.;
datalines;
4350 1 '01JUN2021'd 0
4350 1 '13JUN2021'd 1
4456 3 '27JUL2021'd 0
4456 4 '27NOV2021'd 0
1234 1 '01JAN2021'd 0
1234 1 '10JAN2021'd 1
1234 1 '21JAN2021'd 2
1234 1 '15FEB2021'd 1
;
run;

This is because the last line is only within 30 days of one of the other lines with matching codes and ids.

Kurt_Bremser · Posted 05-20-2021 02:02 PM

So if I interpret you right, we need to keep a starting date for the group, and start a new group when we go past 30 days of that:

data want;
set have;
by id notsorted code;
retain start_date;
l_date = lag(date);
if first.code then start_date = date;
if not first.code and date - l_date le 30
then do;
  if date - start_date le 30
  then count + 1;
  else do;
    count = 1;
    start_date = l_date;
  end;
end;
else count = 0;
drop l_date start_date;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 05-20-2021 01:46 PM

You can save yourself a bit of typing for the data lines in your code:

data have;
input id code date :date9.;
format date date9.;
datalines;
4350 1 01JUN2021
4350 1 13JUN2021
4456 3 27JUL2021
4456 4 27NOV2021
1234 1 01JAN2021
1234 1 10JAN2021
1234 1 21JAN2021
;
run;

the quotes and d are needed to define a literal date in code and not needed, and potentially can cause errors when reading the data.

How to Count Dates within Certain Range

Re: How to Count Dates within Certain Range

Re: How to Count Dates within Certain Range

Re: How to Count Dates within Certain Range

Re: How to Count Dates within Certain Range

Re: How to Count Dates within Certain Range

Registration is open

SAS Training: Just a Click Away