Solved: Extracting ID with consecutive dates

ammarhm · Posted 02-27-2018 06:29 AM

Hi everyone

I have a dataset that looks like this:

Patient_ID	Visit_Date
1	13-08-2016
1	28-05-2013
1	24-12-2013
2	23-05-2012
2	11-01-2014
2	03-10-2017
2	16-02-2015
3	03-08-2013
3	28-10-2017
3	27-10-2013
3	18-04-2015
4	28-04-2016
4	07-08-2012
4	19-03-2013
4	25-05-2017
5	21-11-2014
5	20-03-2016
5	08-01-2015
5	21-09-2012

What i want to do is to extract a list of patients who have been seen in any 2 consecutive years.

Could anyone help please?

Kind regards

Patrick · Posted 02-27-2018 08:01 AM

You don't really tell us how the desired result should look like - especially for cases where you have two visits in year 1 and two visits in year 2.

Below one way to go:

data have;
  infile cards expandtabs truncover;
  input Patient_ID  Visit_Date : ddmmyy12.;
  year=year(visit_date);
  format visit_date ddmmyy10.;
  cards;
1 13-08-2016
1 28-05-2013
1 24-12-2013
2 23-05-2012
2 11-01-2014
2 03-10-2017
2 16-02-2015
3 03-08-2013
3 28-10-2017
3 27-10-2013
3 18-04-2015
4 28-04-2016
4 07-08-2012
4 19-03-2013
4 25-05-2017
5 21-11-2014
5 20-03-2016
5 08-01-2015
5 21-09-2012
;

proc sql;
  create table want as
    select l.*, r.visit_date as next_years_date format=ddmmyy10.
    from have as l inner join have as r
      on 
        l.patient_id=r.patient_id 
        and intck('year',l.Visit_Date,r.Visit_Date)=1
  ;
quit;

View solution in original post

RW9 · Posted 02-27-2018 06:42 AM

Something like (assumes sorted):

data want;
  set have;
  retain lst_v;
  if _n_=1 then lst_v=visit_date;
  else do;
    if year(visit_date)=year(lst_v)+1 then output;
  end;
  lst_v=visit_date;
run;
proc sort data=want nodupkey;
  by patient_id;
run;

Patrick · Posted 02-27-2018 08:01 AM

You don't really tell us how the desired result should look like - especially for cases where you have two visits in year 1 and two visits in year 2.

Below one way to go:

data have;
  infile cards expandtabs truncover;
  input Patient_ID  Visit_Date : ddmmyy12.;
  year=year(visit_date);
  format visit_date ddmmyy10.;
  cards;
1 13-08-2016
1 28-05-2013
1 24-12-2013
2 23-05-2012
2 11-01-2014
2 03-10-2017
2 16-02-2015
3 03-08-2013
3 28-10-2017
3 27-10-2013
3 18-04-2015
4 28-04-2016
4 07-08-2012
4 19-03-2013
4 25-05-2017
5 21-11-2014
5 20-03-2016
5 08-01-2015
5 21-09-2012
;

proc sql;
  create table want as
    select l.*, r.visit_date as next_years_date format=ddmmyy10.
    from have as l inner join have as r
      on 
        l.patient_id=r.patient_id 
        and intck('year',l.Visit_Date,r.Visit_Date)=1
  ;
quit;

Ksharp · Posted 02-27-2018 07:49 AM

data have;
infile cards expandtabs truncover;
input Patient_ID	Visit_Date : ddmmyy12.;
year=year(visit_date);
format visit_date ddmmyy10.;
cards;
1	13-08-2016
1	28-05-2013
1	24-12-2013
2	23-05-2012
2	11-01-2014
2	03-10-2017
2	16-02-2015
3	03-08-2013
3	28-10-2017
3	27-10-2013
3	18-04-2015
4	28-04-2016
4	07-08-2012
4	19-03-2013
4	25-05-2017
5	21-11-2014
5	20-03-2016
5	08-01-2015
5	21-09-2012
;
proc sort data=have out=temp nodupkey;
by Patient_ID year;
run;
data temp;
 set temp;
 by Patient_ID;
 dif=dif(year);
 if first.Patient_ID then call missing(dif);
run;
proc sql;
select distinct Patient_ID
 from temp
  group by Patient_ID
   having sum(dif=1) ne 0;
quit;

mkeintz · Posted 02-27-2018 10:55 PM

I suggest a merge of HAVE with itself, in which one of the merged datasteams starts with firstobs=2.

data want (keep=patient year1 year2);
  set have (keep=patient);
  by patient;

  merge have
        have (firstobs=2 keep=date rename=(date=nxt_date));
  year1=year(date);
  year2=year(nxt_date);

  if last.patient=0 and year2=year1+1;
run;

This writes out one record for each instance of two consecutive records belonging to 2 consecutive years.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Peter_C · Posted 02-28-2018 03:57 AM

Patrick's approach is the way to go especially if the input could sit in memory.

s_lassen · Posted 02-28-2018 07:55 AM

data want;
  used=0; /* this ID has not been output yet */
  do until(last.id);
    set have;
    by id;
    if used then continue;
    if dif(year(visit_date)) ne 1 then continue;
    if first.id then continue;
    used=1;
    output;
    end;
  keep id;
run;

Extracting ID with consecutive dates

Re: Extracting ID with consecutive dates

Re: Extracting ID with consecutive dates

Re: Extracting ID with consecutive dates

Re: Extracting ID with consecutive dates

Re: Extracting ID with consecutive dates

Re: Extracting ID with consecutive dates

Re: Extracting ID with consecutive dates

Classroom Training Available!