Solved: Count continuous months allowing one gap

aaronh · Posted 03-15-2021 03:46 PM

Dear SAS community fellows,

Allow me to delve straight into topic: Let's say a group of patients were supposed to attend a once-per-month consultation for at least 6 months straight last year. They were allowed to skip one session at most during the 6-month period if they wanted to do so. So here is the data, where "month" is the month when the patient came in for the consultation, and "monthcnt" is the _N_ for each patient.

data have;
input patient $ 1 @3 month monthcnt;
datalines;
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
A 6 6
B 1 1
B 3 2
B 4 3
B 5 4
B 6 5
C 7 1
C 8 2
C 9 3
D 1 1
D 4 2
D 5 3
D 6 4
D 7 5
D 8 6
;

Thanks to the program coded by Zaizai Lu and David Shen in their poster about SAS array (https://support.sas.com/resources/papers/proceedings/proceedings/sugi31/156-31.pdf), I was able to obtain patient A, who had 6 consecutive monthly consultations, by running the code below. But I don't know how to obtain patient B and D, who skipped one consultation during a 6-month period. Any insight on how to achieve this will be welcome!

*monthly consultation by patient;
proc transpose data=have prefix=mo out=t;
by patient;
var month;
run;

data ind (keep=patient flag count i rename=(i=monthcnt));
set t;
array mth {*} mo: dummy;
retain flag count 1;
do i=1 to dim(mth)-1;
  if mth[i]^=. then do;
    if mth[i] = mth[i+1]-1 then do;
      output;
      count = count+1;
    end;
    else do;
      output;
      flag = flag+1;
      count = 1;
    end;
  end;
end;
run;

data temp1;
merge ind have;
by patient monthcnt;
run;

*6 consecutive monthly consultation;
data continue (where=(count > = 6));
set temp1;
by patient flag;
retain f_month;
if first.flag then f_month=month;
if last.flag then do;
  l_month=month;
  output;
end;
keep patient f_month l_month count;
run;

mkeintz · Posted 03-16-2021 09:57 AM

Arrays are useful, but after building the array, convert it to a string of 1's and .'s. Then search (using the FIND function) for 5 consecutive ones, or else 5 ones and a single . in 6 consecutive positions.

data have;
input patient $ 1 @3 month monthcnt;
datalines;
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
A 6 6
B 1 1
B 3 2
B 4 3
B 5 4
B 6 5
C 7 1
C 8 2
C 9 3
D 1 1
D 4 2
D 5 3
D 6 4
D 7 5
D 8 6
;
data want (drop=_:);
  array _mnths {20} ;
  do until (last.patient);
    set have ;
    by patient;
    _mnths{month}=1;
  end;
  _strng=cats(of _mnths{*});
  do _srch='11111','1.1111','11.111','111.11','1111.1' until (_f^=0);
   _f=find(_strng,_srch);
  end;

  do until (last.patient);  /* Reread and (optionally) output this patient*/
    set have;
    by patient;
    if _f^=0 then output;
  end;
run;

And if you are worried about indexing multiple months over consecutive years, then just declare a 2-way array, as in:

  array mnths {2014:2016,1:12} ;

Then you can populate the array via statements like:

  mnths{year,month}=1;

The CATS function will concatenate the 1's and .'s in chronological order (assuming the major index is year and the minor index is month), so the rest of the code works unmodified.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

maguiremq · Posted 03-15-2021 04:39 PM

What do you want your data to look like? I'm getting a little lost between all the output statements and arrays. That code seems a bit convoluted.

This flags whether a month was skipped, but I couldn't provide an answer without knowing what you want it to ultimately look like.

data have;
input patient $ 1 @3 month monthcnt;
datalines;
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
A 6 6
B 1 1
B 3 2
B 4 3
B 5 4
B 6 5
C 7 1
C 8 2
C 9 3
D 1 1
D 4 2
D 5 3
D 6 4
D 7 5
D 8 6
;

data have_2;
	set have;
	by patient month;
		lag_month = lag(month);
		if first.patient then call missing(lag_month);
		if not first.patient then do;
			if month - lag_month ^= 1 then skip_month = 1;
			else skip_month = 0;
		end;
run;

You could use PROC MEANS to get the max for each ID and then merge it back. That or PROC SQL. Nonetheless, please also provide a DATALINES statement with your 'want' data set.

aaronh · Posted 03-15-2021 05:54 PM

I only need the patient ID in this case. So for this particular data, I want patient A, B, and D. Patient A has 6 consecutive visits, B has 5 in 6 months (Jan - Jun), and patient D also has 5 in 6 months (Apr - Sep, or Mar - Aug, depending on how to look at it).

ballardw · Posted 03-15-2021 05:19 PM

Having been involved in followup medical appointments I really think that dates of visit should be involved so the folks that start in October will have something that makes since when January comes around. Unless you claim every patient involved in this program starts before June of the year. Every year.

You may also want to notice that the difference between visit number and the months-from-start interval is the number of skipped months

D 1 1   1-1 = 0 skipped months
D 4 2   4-2 = 2 skipped months (month numbers 2 and 3)

aaronh · Posted 03-15-2021 05:51 PM

Indeed... what I have given here really is a representation of what I actually have - enrollment data, but I was really trying to give an example. You are absolutely correct that a patient could start in July and finish in Dec.

maguiremq · Posted 03-16-2021 09:24 AM

This may need to be modified in light of the comments above, but this gives what you requested, but it looks a little ugly in my opinion.

data have;
input patient $ 1 @3 month monthcnt;
datalines;
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
A 6 6
B 1 1
B 3 2
B 4 3
B 5 4
B 6 5
C 7 1
C 8 2
C 9 3
D 1 1
D 4 2
D 5 3
D 6 4
D 7 5
D 8 6
;

data have_2;
	set have;
	by patient month;
		lag_month = lag(month);
		if first.patient then call missing(lag_month);
		if not first.patient then do;
			if month - lag_month ^= 1 then skip_month = 1;
			else skip_month = 0;
		end;
run;

proc summary data = have_2 noprint;
	class patient; /* create table with patient as the identifier */
	var skip_month; /* create column based on whether the individual has a skip month */
	ways 1; /* eliminate default summary row for all records */
	output out = want (drop = _type_
					   where = (total_months >= 6 or (total_months = 5 and skip_month = 1)) 
					   rename = (_freq_ = total_months)) /* (1) drop _type_, (2) subset based on months and skip_month, (3) rename default column _freq_ to total_months just so it looks nicer */
		max = ; /* output the max skip_month for an individual */
run;

Obs patient total_months skip_month 
1 A 6 0 
2 B 5 1 
3 D 6 1

aaronh · Posted 03-16-2021 10:16 AM

Thank you! This is very cool... though if I've understood the code correctly, it may require some more tweaking when a patient visited more than 6 months in a row.

maguiremq · Posted 03-16-2021 10:19 AM

Certainly. I'll update the code to include the >= operator.

mkeintz · Posted 03-16-2021 09:57 AM

Arrays are useful, but after building the array, convert it to a string of 1's and .'s. Then search (using the FIND function) for 5 consecutive ones, or else 5 ones and a single . in 6 consecutive positions.

data have;
input patient $ 1 @3 month monthcnt;
datalines;
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
A 6 6
B 1 1
B 3 2
B 4 3
B 5 4
B 6 5
C 7 1
C 8 2
C 9 3
D 1 1
D 4 2
D 5 3
D 6 4
D 7 5
D 8 6
;
data want (drop=_:);
  array _mnths {20} ;
  do until (last.patient);
    set have ;
    by patient;
    _mnths{month}=1;
  end;
  _strng=cats(of _mnths{*});
  do _srch='11111','1.1111','11.111','111.11','1111.1' until (_f^=0);
   _f=find(_strng,_srch);
  end;

  do until (last.patient);  /* Reread and (optionally) output this patient*/
    set have;
    by patient;
    if _f^=0 then output;
  end;
run;

And if you are worried about indexing multiple months over consecutive years, then just declare a 2-way array, as in:

  array mnths {2014:2016,1:12} ;

Then you can populate the array via statements like:

  mnths{year,month}=1;

The CATS function will concatenate the 1's and .'s in chronological order (assuming the major index is year and the minor index is month), so the rest of the code works unmodified.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

aaronh · Posted 03-16-2021 10:21 AM

Interestingly, one of my colleagues suggested me something that's essentially what you've put down here. It may be a bit challenging to search using an array directly, but if we concatenate all of the indicator variables into one variable, and then use the find() function, it becomes way less complicated. We just need to take note to exhaust all of the possible combinations of the '1111.1' etc etc. Thank you.

Count continuous months allowing one gap

Re: Count continuous months allowing one gap

Re: Count continuous months allowing one gap

Re: Count continuous months allowing one gap

Re: Count continuous months allowing one gap

Re: Count continuous months allowing one gap

Re: Count continuous months allowing one gap

Re: Count continuous months allowing one gap

Re: Count continuous months allowing one gap

Re: Count continuous months allowing one gap

Re: Count continuous months allowing one gap

Classroom Training Available!