I am attempting to repeat events within a certain time. I asked a similar question at the link below, and received an answer that worked for that problem. I'm trying to edit it for a slightly different issue and I'm running into problems.
I want to count the number of "repeat events" for a certain line of information within 30 days. For example here is the data I have and the data I want.
data have;
input id code date1 :date9. date2 :date9.;
format date1 date9. date2 date9.;
datalines;
4350 1 '01JUN2021'd '03JUN2021'd
4350 1 '02JUl2021'd '14JUl2021'd
4350 1 '16JUl2021'd '17JUl2021'd
4350 1 '20JUl2021'd '22JUl2021'd
4456 3 '20AUG2020'd '21OCT2020'd
4456 4 '19MAR2020'd '23MAR2020'd
1234 1 '01JAN2021'd '02JAN2021'd
1234 1 '10JAN2021'd '21JAN2021'd
1234 1 '24FEB2021'd '25FEB2021'd
1234 1 '01MAR2021'd '11MAR2021'd
;
run;
data want;
input id code date1 :date9. date2 :date9. count;
format date date9.;
datalines;
4350 1 '01JUN2021'd '03JUN2021'd 0
4350 1 '02JUl2021'd '14JUl2021'd 1
4350 1 '16JUl2021'd '17JUl2021'd 1
4350 1 '20JUl2021'd '22JUl2021'd 2
4456 3 '20AUG2020'd '21OCT2020'd 0
4456 4 '19MAR2020'd '23MAR2020'd 0
1234 1 '01JAN2021'd '02JAN2021'd 0
1234 1 '10JAN2021'd '21JAN2021'd 1
1234 1 '24FEB2021'd '25FEB2021'd 0
1234 1 '01MAR2021'd '11MAR2021'd 1
;
run;
For every line, I want to see how many previous lines date2 value is within 30 days of the current lines date1 for the same ID and code. I can't figure out how to alter the previous solution to account for comparing between two different date fields now.
Edit: Below is the edited code I've been unable to get to work.
data want_test;
set have;
by id notsorted code;
retain start_date;
if not first.code then l_date = lag(date2);
if first.code then start_date = date1;
if not first.code and date1 - l_date le 30
then do;
test_date=date1 - l_date;
if date1 - l_date le 30
then count + 1;
else do;
count = 1;
start_date = date1;
end;
end;
else count = 0;
format l_date date9. start_date date9.;
run;
Hi again @A_SAS_Man Sorry I failed to notice your response to @PaigeMiller explaining the logic. I read it just now. Right, I think I should be close this time around? Please test-
data have;
input id code date1 :date9. date2 :date9.;
format date1 date9. date2 date9.;
datalines;
4350 1 '01JUN2021'd '03JUN2021'd
4350 1 '02JUl2021'd '14JUl2021'd
4350 1 '16JUl2021'd '17JUl2021'd
4350 1 '20JUl2021'd '22JUl2021'd
4456 3 '20AUG2020'd '21OCT2020'd
4456 4 '19MAR2020'd '23MAR2020'd
1234 1 '01JAN2021'd '02JAN2021'd
1234 1 '10JAN2021'd '21JAN2021'd
1234 1 '24FEB2021'd '25FEB2021'd
1234 1 '01MAR2021'd '11MAR2021'd
;
run;
data want ;
do _n_=1 by 1 until(last.code) ;
set have ;
by id code notsorted ;
array t(999) _temporary_;
if _n_>1 then do __n=1 to _n_-1 ;
if intck('day',t(__n),date1)<30 then count+1 ;
end ;
output ;
count=0 ;
t(_n_)=date2 ;
end ;
call missing(of t(*)) ;
drop _: ;
run ;
proc print noobs ; run ;
id | code | date1 | date2 | count |
---|---|---|---|---|
4350 | 1 | 01JUN2021 | 03JUN2021 | 0 |
4350 | 1 | 02JUL2021 | 14JUL2021 | 1 |
4350 | 1 | 16JUL2021 | 17JUL2021 | 1 |
4350 | 1 | 20JUL2021 | 22JUL2021 | 2 |
4456 | 3 | 20AUG2020 | 21OCT2020 | 0 |
4456 | 4 | 19MAR2020 | 23MAR2020 | 0 |
1234 | 1 | 01JAN2021 | 02JAN2021 | 0 |
1234 | 1 | 10JAN2021 | 21JAN2021 | 1 |
1234 | 1 | 24FEB2021 | 25FEB2021 | 0 |
1234 | 1 | 01MAR2021 | 11MAR2021 | 1 |
Could you explain how to go from the input data to the output data?
Why is COUNT a 0 on the first line, a 1 on lines 2 and 3, and a two on line 4?
It is a count of how many previous lines had a date2 within 30 days of the current lines date1. It is kind of confusing so if I've made an error somewhere let me know and I can correct that.
To add a little more detail, it is 0 on the first one because it is the first line with no previous lines. It is 1 on the second because it was within range of the first. The third is 1 because it was within range of the second, but not the first. The fourth was within range of 2 and 3, but not 1.
Hi @A_SAS_Man Methinks @PaigeMiller 's point is spot on in catching the anomaly in your WANT for the ID= 4350 code=1. Please review the Count in your WANT for the 1st ID group. Your description of logic makes me lean to the following results if I understand you which I believe what Paige is thinking too-
data have;
input id code date1 :date9. date2 :date9.;
format date1 date9. date2 date9.;
datalines;
4350 1 '01JUN2021'd '03JUN2021'd
4350 1 '02JUl2021'd '14JUl2021'd
4350 1 '16JUl2021'd '17JUl2021'd
4350 1 '20JUl2021'd '22JUl2021'd
4456 3 '20AUG2020'd '21OCT2020'd
4456 4 '19MAR2020'd '23MAR2020'd
1234 1 '01JAN2021'd '02JAN2021'd
1234 1 '10JAN2021'd '21JAN2021'd
1234 1 '24FEB2021'd '25FEB2021'd
1234 1 '01MAR2021'd '11MAR2021'd
;
run;
data want ;
do until(last.code) ;
set have ;
by id code notsorted ;
if _d then if intck('day', _d, date1)>30 then count=0 ;
else count+1 ;
output;
_d=date2 ;
end ;
count=0 ;
drop _d ;
run ;
proc print noobs ; run ;
Results;
id | code | date1 | date2 | count |
---|---|---|---|---|
4350 | 1 | 01JUN2021 | 03JUN2021 | 0 |
4350 | 1 | 02JUL2021 | 14JUL2021 | 1 |
4350 | 1 | 16JUL2021 | 17JUL2021 | 2 |
4350 | 1 | 20JUL2021 | 22JUL2021 | 3 |
4456 | 3 | 20AUG2020 | 21OCT2020 | 0 |
4456 | 4 | 19MAR2020 | 23MAR2020 | 0 |
1234 | 1 | 01JAN2021 | 02JAN2021 | 0 |
1234 | 1 | 10JAN2021 | 21JAN2021 | 1 |
1234 | 1 | 24FEB2021 | 25FEB2021 | 0 |
1234 | 1 | 01MAR2021 | 11MAR2021 | 1 |
So this isn't quite what I need. The third line should be 1 because July 16 and June 3 are more than 30 days apart and July 16 and July 14 are less than 30 days apart. Similarly, line 4 should be 2 because July 20 is less than 30 days from July 17 and July 14, but more than 30 from June 3.
Hi again @A_SAS_Man Sorry I failed to notice your response to @PaigeMiller explaining the logic. I read it just now. Right, I think I should be close this time around? Please test-
data have;
input id code date1 :date9. date2 :date9.;
format date1 date9. date2 date9.;
datalines;
4350 1 '01JUN2021'd '03JUN2021'd
4350 1 '02JUl2021'd '14JUl2021'd
4350 1 '16JUl2021'd '17JUl2021'd
4350 1 '20JUl2021'd '22JUl2021'd
4456 3 '20AUG2020'd '21OCT2020'd
4456 4 '19MAR2020'd '23MAR2020'd
1234 1 '01JAN2021'd '02JAN2021'd
1234 1 '10JAN2021'd '21JAN2021'd
1234 1 '24FEB2021'd '25FEB2021'd
1234 1 '01MAR2021'd '11MAR2021'd
;
run;
data want ;
do _n_=1 by 1 until(last.code) ;
set have ;
by id code notsorted ;
array t(999) _temporary_;
if _n_>1 then do __n=1 to _n_-1 ;
if intck('day',t(__n),date1)<30 then count+1 ;
end ;
output ;
count=0 ;
t(_n_)=date2 ;
end ;
call missing(of t(*)) ;
drop _: ;
run ;
proc print noobs ; run ;
id | code | date1 | date2 | count |
---|---|---|---|---|
4350 | 1 | 01JUN2021 | 03JUN2021 | 0 |
4350 | 1 | 02JUL2021 | 14JUL2021 | 1 |
4350 | 1 | 16JUL2021 | 17JUL2021 | 1 |
4350 | 1 | 20JUL2021 | 22JUL2021 | 2 |
4456 | 3 | 20AUG2020 | 21OCT2020 | 0 |
4456 | 4 | 19MAR2020 | 23MAR2020 | 0 |
1234 | 1 | 01JAN2021 | 02JAN2021 | 0 |
1234 | 1 | 10JAN2021 | 21JAN2021 | 1 |
1234 | 1 | 24FEB2021 | 25FEB2021 | 0 |
1234 | 1 | 01MAR2021 | 11MAR2021 | 1 |
No biggie, Should you come across a situation where you have more than 999 records for a single Id Code group combination, please make sure you resize the array something bigger like-
array t(999) _temporary_; /*existing 999*/
array t(9999) _temporary_; /*make it larger 9999*/
Test thoroughly and then accept the answer.
data have; input id code date1 :date9. date2 :date9.; format date1 date9. date2 date9.; datalines; 4350 1 '01JUN2021'd '03JUN2021'd 4350 1 '02JUl2021'd '14JUl2021'd 4350 1 '16JUl2021'd '17JUl2021'd 4350 1 '20JUl2021'd '22JUl2021'd 4456 3 '20AUG2020'd '21OCT2020'd 4456 4 '19MAR2020'd '23MAR2020'd 1234 1 '01JAN2021'd '02JAN2021'd 1234 1 '10JAN2021'd '21JAN2021'd 1234 1 '24FEB2021'd '25FEB2021'd 1234 1 '01MAR2021'd '11MAR2021'd ; run; proc sql; create table want as select a.*,(select count(*) from have where id=a.id and code=a.code and date1<a.date1 and date2 between a.date1-30 and a.date1) as count from have as a; quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.