Hello, I need to find the number of repeat visits within a year, starting from the initial visit. Ex., data is as below. A has 0 repeat visits within a year, B has 6, C has 1. I tried the LAG and INTCK functions, but couldn't get the correct answer.
ID | TIME_IN |
A | 5/8/2015 |
A | 10/25/2016 |
B | 6/3/2016 |
B | 7/6/2016 |
B | 8/22/2016 |
B | 10/31/2016 |
B | 1/9/2017 |
B | 2/24/2017 |
B | 4/7/2017 |
C | 7/7/2014 |
C | 7/29/2014 |
C | 8/24/2015 |
C | 10/28/2015 |
C | 1/4/2016 |
Here's a simple way:
data want;
set have;
by id;
if first.id then do;
repeat_visits=0;
baseline_date = time_in;
end;
else if time_in <= baseline_date + 365 then repeat_visits + 1;
if last.id;
retain baseline_date;
keep id baseline_date repeat_visits;
run;
There are fancier ways (other than adding 365) to define "within a year" including ways that account for leap years. So it depends on how exact you want to be.
And what defines a repeat visit? Post test data in the form of a datastep and show what you want to have output. Then explain the logic of why. As for your problem, simply:
data want; set have; retain repeat; by id; if first.id then repeat=0; else repeat=repeat+1; run;
I'm not sure I follow your logic? How does B have 6 repeat visits? What is the exact definition for a repeat visit?
Here's a simple way:
data want;
set have;
by id;
if first.id then do;
repeat_visits=0;
baseline_date = time_in;
end;
else if time_in <= baseline_date + 365 then repeat_visits + 1;
if last.id;
retain baseline_date;
keep id baseline_date repeat_visits;
run;
There are fancier ways (other than adding 365) to define "within a year" including ways that account for leap years. So it depends on how exact you want to be.
Hi,
there are several ways to perform.
I have 4 in mind:
- use the classic retain
- use proc SQL update
- use call execute
- use the new function dosubl for a more straight forward approach which blows my mind 🙂
I've described the retain and dosubl in the code below:
data test;
informat ID $1. TIME_IN mmddyy10.;
input id time_in;
format TIME_IN mmddyy10.;
datalines;
A 5/8/2015
A 10/25/2016
B 6/3/2016
B 7/6/2016
B 8/22/2016
B 10/31/2016
B 1/9/2017
B 2/24/2017
B 4/7/2017
C 7/7/2014
C 7/29/2014
C 8/24/2015
C 10/28/2015
C 1/4/2016
;
run;
/* RETAIN approach */
data test1;
set test;
by id time_in;
retain min max h_repeatVisits .;
if first.id then do;
min=intnx("day", time_in, 1, "sameday");
max=intnx("year", time_in, 1, "sameday");
h_repeatVisits=0;
end;
if min<time_in<=max then h_repeatVisits=h_repeatVisits+1;
run;
proc sort data=test1; by id descending time_in;run;
data test2;
set test1;
by id descending time_in;
retain repeatVisits .;
if first.id then repeatVisits=h_repeatVisits;
drop min max h_repeatVisits;
run;
proc sort data=test2; by id time_in; run;
/* DOSUBL approach */
data test3;
set test;
by id time_in;
retain repeatVisits 0;
if first.id then do;
rc=dosubl('%LET n=0;
PROC SQL noprint;
SELECT count(*) INTO :n
FROM test
WHERE id EQ "'||strip(id)||'"
AND '||put(time_in,best8.)||'<time_in<= '||put(intnx("year", time_in, 1, "sameday"),best8.)
||';QUIT;');
repeatVisits=input(symget('n'),best8.);
end;
drop rc;
run;
- Cheers -
Thank you for the various methods and increasing my knowledge ... I really appreciate it !!
Count the occurrence of year within each ID and subtract 1?
proc summary data=have nway; class id date; format date year4.; output out=yearcount (drop=_type_); run;
will have a variable _freq_ with the count.
Assumes that your date is a SAS date value and NOT a character variable.
A data step could subtract the 1.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.