Hi everyone
I have a dataset that looks like this:
Patient_ID | Visit_Date |
1 | 13-08-2016 |
1 | 28-05-2013 |
1 | 24-12-2013 |
2 | 23-05-2012 |
2 | 11-01-2014 |
2 | 03-10-2017 |
2 | 16-02-2015 |
3 | 03-08-2013 |
3 | 28-10-2017 |
3 | 27-10-2013 |
3 | 18-04-2015 |
4 | 28-04-2016 |
4 | 07-08-2012 |
4 | 19-03-2013 |
4 | 25-05-2017 |
5 | 21-11-2014 |
5 | 20-03-2016 |
5 | 08-01-2015 |
5 | 21-09-2012 |
What i want to do is to extract a list of patients who have been seen in any 2 consecutive years.
Could anyone help please?
Kind regards
You don't really tell us how the desired result should look like - especially for cases where you have two visits in year 1 and two visits in year 2.
Below one way to go:
data have;
infile cards expandtabs truncover;
input Patient_ID Visit_Date : ddmmyy12.;
year=year(visit_date);
format visit_date ddmmyy10.;
cards;
1 13-08-2016
1 28-05-2013
1 24-12-2013
2 23-05-2012
2 11-01-2014
2 03-10-2017
2 16-02-2015
3 03-08-2013
3 28-10-2017
3 27-10-2013
3 18-04-2015
4 28-04-2016
4 07-08-2012
4 19-03-2013
4 25-05-2017
5 21-11-2014
5 20-03-2016
5 08-01-2015
5 21-09-2012
;
proc sql;
create table want as
select l.*, r.visit_date as next_years_date format=ddmmyy10.
from have as l inner join have as r
on
l.patient_id=r.patient_id
and intck('year',l.Visit_Date,r.Visit_Date)=1
;
quit;
Something like (assumes sorted):
data want; set have; retain lst_v; if _n_=1 then lst_v=visit_date; else do; if year(visit_date)=year(lst_v)+1 then output; end; lst_v=visit_date; run; proc sort data=want nodupkey; by patient_id; run;
You don't really tell us how the desired result should look like - especially for cases where you have two visits in year 1 and two visits in year 2.
Below one way to go:
data have;
infile cards expandtabs truncover;
input Patient_ID Visit_Date : ddmmyy12.;
year=year(visit_date);
format visit_date ddmmyy10.;
cards;
1 13-08-2016
1 28-05-2013
1 24-12-2013
2 23-05-2012
2 11-01-2014
2 03-10-2017
2 16-02-2015
3 03-08-2013
3 28-10-2017
3 27-10-2013
3 18-04-2015
4 28-04-2016
4 07-08-2012
4 19-03-2013
4 25-05-2017
5 21-11-2014
5 20-03-2016
5 08-01-2015
5 21-09-2012
;
proc sql;
create table want as
select l.*, r.visit_date as next_years_date format=ddmmyy10.
from have as l inner join have as r
on
l.patient_id=r.patient_id
and intck('year',l.Visit_Date,r.Visit_Date)=1
;
quit;
data have;
infile cards expandtabs truncover;
input Patient_ID Visit_Date : ddmmyy12.;
year=year(visit_date);
format visit_date ddmmyy10.;
cards;
1 13-08-2016
1 28-05-2013
1 24-12-2013
2 23-05-2012
2 11-01-2014
2 03-10-2017
2 16-02-2015
3 03-08-2013
3 28-10-2017
3 27-10-2013
3 18-04-2015
4 28-04-2016
4 07-08-2012
4 19-03-2013
4 25-05-2017
5 21-11-2014
5 20-03-2016
5 08-01-2015
5 21-09-2012
;
proc sort data=have out=temp nodupkey;
by Patient_ID year;
run;
data temp;
set temp;
by Patient_ID;
dif=dif(year);
if first.Patient_ID then call missing(dif);
run;
proc sql;
select distinct Patient_ID
from temp
group by Patient_ID
having sum(dif=1) ne 0;
quit;
I suggest a merge of HAVE with itself, in which one of the merged datasteams starts with firstobs=2.
data want (keep=patient year1 year2);
set have (keep=patient);
by patient;
merge have
have (firstobs=2 keep=date rename=(date=nxt_date));
year1=year(date);
year2=year(nxt_date);
if last.patient=0 and year2=year1+1;
run;
This writes out one record for each instance of two consecutive records belonging to 2 consecutive years.
data want;
used=0; /* this ID has not been output yet */
do until(last.id);
set have;
by id;
if used then continue;
if dif(year(visit_date)) ne 1 then continue;
if first.id then continue;
used=1;
output;
end;
keep id;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.