Dear SAS community fellows,
Allow me to delve straight into topic: Let's say a group of patients were supposed to attend a once-per-month consultation for at least 6 months straight last year. They were allowed to skip one session at most during the 6-month period if they wanted to do so. So here is the data, where "month" is the month when the patient came in for the consultation, and "monthcnt" is the _N_ for each patient.
data have;
input patient $ 1 @3 month monthcnt;
datalines;
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
A 6 6
B 1 1
B 3 2
B 4 3
B 5 4
B 6 5
C 7 1
C 8 2
C 9 3
D 1 1
D 4 2
D 5 3
D 6 4
D 7 5
D 8 6
;
Thanks to the program coded by Zaizai Lu and David Shen in their poster about SAS array (https://support.sas.com/resources/papers/proceedings/proceedings/sugi31/156-31.pdf), I was able to obtain patient A, who had 6 consecutive monthly consultations, by running the code below. But I don't know how to obtain patient B and D, who skipped one consultation during a 6-month period. Any insight on how to achieve this will be welcome!
*monthly consultation by patient;
proc transpose data=have prefix=mo out=t;
by patient;
var month;
run;
data ind (keep=patient flag count i rename=(i=monthcnt));
set t;
array mth {*} mo: dummy;
retain flag count 1;
do i=1 to dim(mth)-1;
if mth[i]^=. then do;
if mth[i] = mth[i+1]-1 then do;
output;
count = count+1;
end;
else do;
output;
flag = flag+1;
count = 1;
end;
end;
end;
run;
data temp1;
merge ind have;
by patient monthcnt;
run;
*6 consecutive monthly consultation;
data continue (where=(count > = 6));
set temp1;
by patient flag;
retain f_month;
if first.flag then f_month=month;
if last.flag then do;
l_month=month;
output;
end;
keep patient f_month l_month count;
run;
Arrays are useful, but after building the array, convert it to a string of 1's and .'s. Then search (using the FIND function) for 5 consecutive ones, or else 5 ones and a single . in 6 consecutive positions.
data have;
input patient $ 1 @3 month monthcnt;
datalines;
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
A 6 6
B 1 1
B 3 2
B 4 3
B 5 4
B 6 5
C 7 1
C 8 2
C 9 3
D 1 1
D 4 2
D 5 3
D 6 4
D 7 5
D 8 6
;
data want (drop=_:);
array _mnths {20} ;
do until (last.patient);
set have ;
by patient;
_mnths{month}=1;
end;
_strng=cats(of _mnths{*});
do _srch='11111','1.1111','11.111','111.11','1111.1' until (_f^=0);
_f=find(_strng,_srch);
end;
do until (last.patient); /* Reread and (optionally) output this patient*/
set have;
by patient;
if _f^=0 then output;
end;
run;
And if you are worried about indexing multiple months over consecutive years, then just declare a 2-way array, as in:
array mnths {2014:2016,1:12} ;
Then you can populate the array via statements like:
mnths{year,month}=1;
The CATS function will concatenate the 1's and .'s in chronological order (assuming the major index is year and the minor index is month), so the rest of the code works unmodified.
What do you want your data to look like? I'm getting a little lost between all the output statements and arrays. That code seems a bit convoluted.
This flags whether a month was skipped, but I couldn't provide an answer without knowing what you want it to ultimately look like.
data have;
input patient $ 1 @3 month monthcnt;
datalines;
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
A 6 6
B 1 1
B 3 2
B 4 3
B 5 4
B 6 5
C 7 1
C 8 2
C 9 3
D 1 1
D 4 2
D 5 3
D 6 4
D 7 5
D 8 6
;
data have_2;
set have;
by patient month;
lag_month = lag(month);
if first.patient then call missing(lag_month);
if not first.patient then do;
if month - lag_month ^= 1 then skip_month = 1;
else skip_month = 0;
end;
run;
You could use PROC MEANS to get the max for each ID and then merge it back. That or PROC SQL. Nonetheless, please also provide a DATALINES statement with your 'want' data set.
Having been involved in followup medical appointments I really think that dates of visit should be involved so the folks that start in October will have something that makes since when January comes around. Unless you claim every patient involved in this program starts before June of the year. Every year.
You may also want to notice that the difference between visit number and the months-from-start interval is the number of skipped months
D 1 1 1-1 = 0 skipped months
D 4 2 4-2 = 2 skipped months (month numbers 2 and 3)
This may need to be modified in light of the comments above, but this gives what you requested, but it looks a little ugly in my opinion.
data have;
input patient $ 1 @3 month monthcnt;
datalines;
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
A 6 6
B 1 1
B 3 2
B 4 3
B 5 4
B 6 5
C 7 1
C 8 2
C 9 3
D 1 1
D 4 2
D 5 3
D 6 4
D 7 5
D 8 6
;
data have_2;
set have;
by patient month;
lag_month = lag(month);
if first.patient then call missing(lag_month);
if not first.patient then do;
if month - lag_month ^= 1 then skip_month = 1;
else skip_month = 0;
end;
run;
proc summary data = have_2 noprint;
class patient; /* create table with patient as the identifier */
var skip_month; /* create column based on whether the individual has a skip month */
ways 1; /* eliminate default summary row for all records */
output out = want (drop = _type_
where = (total_months >= 6 or (total_months = 5 and skip_month = 1))
rename = (_freq_ = total_months)) /* (1) drop _type_, (2) subset based on months and skip_month, (3) rename default column _freq_ to total_months just so it looks nicer */
max = ; /* output the max skip_month for an individual */
run;
Obs patient total_months skip_month 1 A 6 0 2 B 5 1 3 D 6 1
Arrays are useful, but after building the array, convert it to a string of 1's and .'s. Then search (using the FIND function) for 5 consecutive ones, or else 5 ones and a single . in 6 consecutive positions.
data have;
input patient $ 1 @3 month monthcnt;
datalines;
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
A 6 6
B 1 1
B 3 2
B 4 3
B 5 4
B 6 5
C 7 1
C 8 2
C 9 3
D 1 1
D 4 2
D 5 3
D 6 4
D 7 5
D 8 6
;
data want (drop=_:);
array _mnths {20} ;
do until (last.patient);
set have ;
by patient;
_mnths{month}=1;
end;
_strng=cats(of _mnths{*});
do _srch='11111','1.1111','11.111','111.11','1111.1' until (_f^=0);
_f=find(_strng,_srch);
end;
do until (last.patient); /* Reread and (optionally) output this patient*/
set have;
by patient;
if _f^=0 then output;
end;
run;
And if you are worried about indexing multiple months over consecutive years, then just declare a 2-way array, as in:
array mnths {2014:2016,1:12} ;
Then you can populate the array via statements like:
mnths{year,month}=1;
The CATS function will concatenate the 1's and .'s in chronological order (assuming the major index is year and the minor index is month), so the rest of the code works unmodified.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.