I am attempting to repeat events within a certain time. I asked a similar question at the link below, and received an answer that worked for that problem. I'm trying to edit it for a slightly different issue and I'm running into problems.
I want to count the number of "repeat events" for a certain line of information within 30 days. For example here is the data I have and the data I want.
data have;
input id code date1 :date9. date2 :date9.;
format date1 date9. date2 date9.;
datalines;
4350 1 '01JUN2021'd '03JUN2021'd
4350 1 '02JUl2021'd '14JUl2021'd
4350 1 '16JUl2021'd '17JUl2021'd
4350 1 '20JUl2021'd '22JUl2021'd
4456 3 '20AUG2020'd '21OCT2020'd
4456 4 '19MAR2020'd '23MAR2020'd
1234 1 '01JAN2021'd '02JAN2021'd
1234 1 '10JAN2021'd '21JAN2021'd
1234 1 '24FEB2021'd '25FEB2021'd
1234 1 '01MAR2021'd '11MAR2021'd
;
run;
data want;
input id code date1 :date9. date2 :date9. count;
format date date9.;
datalines;
4350 1 '01JUN2021'd '03JUN2021'd 0
4350 1 '02JUl2021'd '14JUl2021'd 1
4350 1 '16JUl2021'd '17JUl2021'd 1
4350 1 '20JUl2021'd '22JUl2021'd 2
4456 3 '20AUG2020'd '21OCT2020'd 0
4456 4 '19MAR2020'd '23MAR2020'd 0
1234 1 '01JAN2021'd '02JAN2021'd 0
1234 1 '10JAN2021'd '21JAN2021'd 1
1234 1 '24FEB2021'd '25FEB2021'd 0
1234 1 '01MAR2021'd '11MAR2021'd 1
;
run;For every line, I want to see how many previous lines date2 value is within 30 days of the current lines date1 for the same ID and code. I can't figure out how to alter the previous solution to account for comparing between two different date fields now.
Edit: Below is the edited code I've been unable to get to work.
data want_test;
set have;
by id notsorted code;
retain start_date;
if not first.code then l_date = lag(date2);
if first.code then start_date = date1;
if not first.code and date1 - l_date le 30
then do;
test_date=date1 - l_date;
if date1 - l_date le 30
then count + 1;
else do;
count = 1;
start_date = date1;
end;
end;
else count = 0;
format l_date date9. start_date date9.;
run;
Hi again @A_SAS_Man Sorry I failed to notice your response to @PaigeMiller explaining the logic. I read it just now. Right, I think I should be close this time around? Please test-
data have;
input id code date1 :date9. date2 :date9.;
format date1 date9. date2 date9.;
datalines;
4350 1 '01JUN2021'd '03JUN2021'd
4350 1 '02JUl2021'd '14JUl2021'd
4350 1 '16JUl2021'd '17JUl2021'd
4350 1 '20JUl2021'd '22JUl2021'd
4456 3 '20AUG2020'd '21OCT2020'd
4456 4 '19MAR2020'd '23MAR2020'd
1234 1 '01JAN2021'd '02JAN2021'd
1234 1 '10JAN2021'd '21JAN2021'd
1234 1 '24FEB2021'd '25FEB2021'd
1234 1 '01MAR2021'd '11MAR2021'd
;
run;
data want ;
do _n_=1 by 1 until(last.code) ;
set have ;
by id code notsorted ;
array t(999) _temporary_;
if _n_>1 then do __n=1 to _n_-1 ;
if intck('day',t(__n),date1)<30 then count+1 ;
end ;
output ;
count=0 ;
t(_n_)=date2 ;
end ;
call missing(of t(*)) ;
drop _: ;
run ;
proc print noobs ; run ;
| id | code | date1 | date2 | count |
|---|---|---|---|---|
| 4350 | 1 | 01JUN2021 | 03JUN2021 | 0 |
| 4350 | 1 | 02JUL2021 | 14JUL2021 | 1 |
| 4350 | 1 | 16JUL2021 | 17JUL2021 | 1 |
| 4350 | 1 | 20JUL2021 | 22JUL2021 | 2 |
| 4456 | 3 | 20AUG2020 | 21OCT2020 | 0 |
| 4456 | 4 | 19MAR2020 | 23MAR2020 | 0 |
| 1234 | 1 | 01JAN2021 | 02JAN2021 | 0 |
| 1234 | 1 | 10JAN2021 | 21JAN2021 | 1 |
| 1234 | 1 | 24FEB2021 | 25FEB2021 | 0 |
| 1234 | 1 | 01MAR2021 | 11MAR2021 | 1 |
Could you explain how to go from the input data to the output data?
Why is COUNT a 0 on the first line, a 1 on lines 2 and 3, and a two on line 4?
It is a count of how many previous lines had a date2 within 30 days of the current lines date1. It is kind of confusing so if I've made an error somewhere let me know and I can correct that.
To add a little more detail, it is 0 on the first one because it is the first line with no previous lines. It is 1 on the second because it was within range of the first. The third is 1 because it was within range of the second, but not the first. The fourth was within range of 2 and 3, but not 1.
Hi @A_SAS_Man Methinks @PaigeMiller 's point is spot on in catching the anomaly in your WANT for the ID= 4350 code=1. Please review the Count in your WANT for the 1st ID group. Your description of logic makes me lean to the following results if I understand you which I believe what Paige is thinking too-
data have;
input id code date1 :date9. date2 :date9.;
format date1 date9. date2 date9.;
datalines;
4350 1 '01JUN2021'd '03JUN2021'd
4350 1 '02JUl2021'd '14JUl2021'd
4350 1 '16JUl2021'd '17JUl2021'd
4350 1 '20JUl2021'd '22JUl2021'd
4456 3 '20AUG2020'd '21OCT2020'd
4456 4 '19MAR2020'd '23MAR2020'd
1234 1 '01JAN2021'd '02JAN2021'd
1234 1 '10JAN2021'd '21JAN2021'd
1234 1 '24FEB2021'd '25FEB2021'd
1234 1 '01MAR2021'd '11MAR2021'd
;
run;
data want ;
do until(last.code) ;
set have ;
by id code notsorted ;
if _d then if intck('day', _d, date1)>30 then count=0 ;
else count+1 ;
output;
_d=date2 ;
end ;
count=0 ;
drop _d ;
run ;
proc print noobs ; run ;
Results;
| id | code | date1 | date2 | count |
|---|---|---|---|---|
| 4350 | 1 | 01JUN2021 | 03JUN2021 | 0 |
| 4350 | 1 | 02JUL2021 | 14JUL2021 | 1 |
| 4350 | 1 | 16JUL2021 | 17JUL2021 | 2 |
| 4350 | 1 | 20JUL2021 | 22JUL2021 | 3 |
| 4456 | 3 | 20AUG2020 | 21OCT2020 | 0 |
| 4456 | 4 | 19MAR2020 | 23MAR2020 | 0 |
| 1234 | 1 | 01JAN2021 | 02JAN2021 | 0 |
| 1234 | 1 | 10JAN2021 | 21JAN2021 | 1 |
| 1234 | 1 | 24FEB2021 | 25FEB2021 | 0 |
| 1234 | 1 | 01MAR2021 | 11MAR2021 | 1 |
So this isn't quite what I need. The third line should be 1 because July 16 and June 3 are more than 30 days apart and July 16 and July 14 are less than 30 days apart. Similarly, line 4 should be 2 because July 20 is less than 30 days from July 17 and July 14, but more than 30 from June 3.
Hi again @A_SAS_Man Sorry I failed to notice your response to @PaigeMiller explaining the logic. I read it just now. Right, I think I should be close this time around? Please test-
data have;
input id code date1 :date9. date2 :date9.;
format date1 date9. date2 date9.;
datalines;
4350 1 '01JUN2021'd '03JUN2021'd
4350 1 '02JUl2021'd '14JUl2021'd
4350 1 '16JUl2021'd '17JUl2021'd
4350 1 '20JUl2021'd '22JUl2021'd
4456 3 '20AUG2020'd '21OCT2020'd
4456 4 '19MAR2020'd '23MAR2020'd
1234 1 '01JAN2021'd '02JAN2021'd
1234 1 '10JAN2021'd '21JAN2021'd
1234 1 '24FEB2021'd '25FEB2021'd
1234 1 '01MAR2021'd '11MAR2021'd
;
run;
data want ;
do _n_=1 by 1 until(last.code) ;
set have ;
by id code notsorted ;
array t(999) _temporary_;
if _n_>1 then do __n=1 to _n_-1 ;
if intck('day',t(__n),date1)<30 then count+1 ;
end ;
output ;
count=0 ;
t(_n_)=date2 ;
end ;
call missing(of t(*)) ;
drop _: ;
run ;
proc print noobs ; run ;
| id | code | date1 | date2 | count |
|---|---|---|---|---|
| 4350 | 1 | 01JUN2021 | 03JUN2021 | 0 |
| 4350 | 1 | 02JUL2021 | 14JUL2021 | 1 |
| 4350 | 1 | 16JUL2021 | 17JUL2021 | 1 |
| 4350 | 1 | 20JUL2021 | 22JUL2021 | 2 |
| 4456 | 3 | 20AUG2020 | 21OCT2020 | 0 |
| 4456 | 4 | 19MAR2020 | 23MAR2020 | 0 |
| 1234 | 1 | 01JAN2021 | 02JAN2021 | 0 |
| 1234 | 1 | 10JAN2021 | 21JAN2021 | 1 |
| 1234 | 1 | 24FEB2021 | 25FEB2021 | 0 |
| 1234 | 1 | 01MAR2021 | 11MAR2021 | 1 |
No biggie, Should you come across a situation where you have more than 999 records for a single Id Code group combination, please make sure you resize the array something bigger like-
array t(999) _temporary_; /*existing 999*/
array t(9999) _temporary_; /*make it larger 9999*/
Test thoroughly and then accept the answer.
data have; input id code date1 :date9. date2 :date9.; format date1 date9. date2 date9.; datalines; 4350 1 '01JUN2021'd '03JUN2021'd 4350 1 '02JUl2021'd '14JUl2021'd 4350 1 '16JUl2021'd '17JUl2021'd 4350 1 '20JUl2021'd '22JUl2021'd 4456 3 '20AUG2020'd '21OCT2020'd 4456 4 '19MAR2020'd '23MAR2020'd 1234 1 '01JAN2021'd '02JAN2021'd 1234 1 '10JAN2021'd '21JAN2021'd 1234 1 '24FEB2021'd '25FEB2021'd 1234 1 '01MAR2021'd '11MAR2021'd ; run; proc sql; create table want as select a.*,(select count(*) from have where id=a.id and code=a.code and date1<a.date1 and date2 between a.date1-30 and a.date1) as count from have as a; quit;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.