BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aaronh
Quartz | Level 8

Dear SAS community fellows,

Allow me to delve straight into topic: Let's say a group of patients were supposed to attend a once-per-month consultation for at least 6 months straight last year. They were allowed to skip one session at most during the 6-month period if they wanted to do so. So here is the data, where "month" is the month when the patient came in for the consultation, and "monthcnt" is the _N_ for each patient.

 

data have;
input patient $ 1 @3 month monthcnt;
datalines;
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
A 6 6
B 1 1
B 3 2
B 4 3
B 5 4
B 6 5
C 7 1
C 8 2
C 9 3
D 1 1
D 4 2
D 5 3
D 6 4
D 7 5
D 8 6
;

 

Thanks to the program coded by Zaizai Lu and David Shen in their poster about SAS array (https://support.sas.com/resources/papers/proceedings/proceedings/sugi31/156-31.pdf), I was able to obtain patient A, who had 6 consecutive monthly consultations, by running the code below. But I don't know how to obtain patient B and D, who skipped one consultation during a 6-month period. Any insight on how to achieve this will be welcome!

 

*monthly consultation by patient;
proc transpose data=have prefix=mo out=t;
by patient;
var month;
run;

 

data ind (keep=patient flag count i rename=(i=monthcnt));
set t;
array mth {*} mo: dummy;
retain flag count 1;
do i=1 to dim(mth)-1;
  if mth[i]^=. then do;
    if mth[i] = mth[i+1]-1 then do;
      output;
      count = count+1;
    end;
    else do;
      output;
      flag = flag+1;
      count = 1;
    end;
  end;
end;
run;

 

 

data temp1;
merge ind have;
by patient monthcnt;
run;

 

*6 consecutive monthly consultation;
data continue (where=(count > = 6));
set temp1;
by patient flag;
retain f_month;
if first.flag then f_month=month;
if last.flag then do;
  l_month=month;
  output;
end;
keep patient f_month l_month count;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Arrays are useful, but after building the array, convert it to a string of 1's and .'s.  Then search (using the FIND function) for 5 consecutive ones, or else 5 ones and a single . in 6 consecutive positions.

 

data have;
input patient $ 1 @3 month monthcnt;
datalines;
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
A 6 6
B 1 1
B 3 2
B 4 3
B 5 4
B 6 5
C 7 1
C 8 2
C 9 3
D 1 1
D 4 2
D 5 3
D 6 4
D 7 5
D 8 6
;
data want (drop=_:);
  array _mnths {20} ;
  do until (last.patient);
    set have ;
    by patient;
    _mnths{month}=1;
  end;
  _strng=cats(of _mnths{*});
  do _srch='11111','1.1111','11.111','111.11','1111.1' until (_f^=0);
   _f=find(_strng,_srch);
  end;

  do until (last.patient);  /* Reread and (optionally) output this patient*/
    set have;
    by patient;
    if _f^=0 then output;
  end;
run;

And if you are worried about indexing multiple months over consecutive years, then just declare a 2-way array, as in:

 

  array mnths {2014:2016,1:12} ;

Then you can populate the array via statements like: 

  mnths{year,month}=1;

The CATS function will concatenate the 1's and .'s in chronological order (assuming the major index is year and the minor index is month), so the rest of the code works unmodified. 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

9 REPLIES 9
maguiremq
SAS Super FREQ

What do you want your data to look like? I'm getting a little lost between all the output statements and arrays. That code seems a bit convoluted.

 

This flags whether a month was skipped, but I couldn't provide an answer without knowing what you want it to ultimately look like.

 

data have;
input patient $ 1 @3 month monthcnt;
datalines;
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
A 6 6
B 1 1
B 3 2
B 4 3
B 5 4
B 6 5
C 7 1
C 8 2
C 9 3
D 1 1
D 4 2
D 5 3
D 6 4
D 7 5
D 8 6
;

data have_2;
	set have;
	by patient month;
		lag_month = lag(month);
		if first.patient then call missing(lag_month);
		if not first.patient then do;
			if month - lag_month ^= 1 then skip_month = 1;
			else skip_month = 0;
		end;
run;

You could use PROC MEANS to get the max for each ID and then merge it back. That or PROC SQL. Nonetheless, please also provide a DATALINES statement with your 'want' data set.

 

aaronh
Quartz | Level 8
I only need the patient ID in this case. So for this particular data, I want patient A, B, and D. Patient A has 6 consecutive visits, B has 5 in 6 months (Jan - Jun), and patient D also has 5 in 6 months (Apr - Sep, or Mar - Aug, depending on how to look at it).
ballardw
Super User

Having been involved in followup medical appointments I really think that dates of visit should be involved so the folks that start in October will have something that makes since when January comes around. Unless you claim every patient involved in this program starts before June of the year. Every year.

 

You may also want to notice that the difference between visit number and the months-from-start interval is the number of skipped months

 

D 1 1   1-1 = 0 skipped months
D 4 2   4-2 = 2 skipped months (month numbers 2 and 3)
aaronh
Quartz | Level 8
Indeed... what I have given here really is a representation of what I actually have - enrollment data, but I was really trying to give an example. You are absolutely correct that a patient could start in July and finish in Dec.
maguiremq
SAS Super FREQ

This may need to be modified in light of the comments above, but this gives what you requested, but it looks a little ugly in my opinion.

data have;
input patient $ 1 @3 month monthcnt;
datalines;
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
A 6 6
B 1 1
B 3 2
B 4 3
B 5 4
B 6 5
C 7 1
C 8 2
C 9 3
D 1 1
D 4 2
D 5 3
D 6 4
D 7 5
D 8 6
;

data have_2;
	set have;
	by patient month;
		lag_month = lag(month);
		if first.patient then call missing(lag_month);
		if not first.patient then do;
			if month - lag_month ^= 1 then skip_month = 1;
			else skip_month = 0;
		end;
run;

proc summary data = have_2 noprint;
	class patient; /* create table with patient as the identifier */
	var skip_month; /* create column based on whether the individual has a skip month */
	ways 1; /* eliminate default summary row for all records */
	output out = want (drop = _type_
					   where = (total_months >= 6 or (total_months = 5 and skip_month = 1)) 
					   rename = (_freq_ = total_months)) /* (1) drop _type_, (2) subset based on months and skip_month, (3) rename default column _freq_ to total_months just so it looks nicer */
		max = ; /* output the max skip_month for an individual */
run;

Obs patient total_months skip_month 
1 A 6 0 
2 B 5 1 
3 D 6 1 

 

 

aaronh
Quartz | Level 8
Thank you! This is very cool... though if I've understood the code correctly, it may require some more tweaking when a patient visited more than 6 months in a row.
maguiremq
SAS Super FREQ
Certainly. I'll update the code to include the >= operator.
mkeintz
PROC Star

Arrays are useful, but after building the array, convert it to a string of 1's and .'s.  Then search (using the FIND function) for 5 consecutive ones, or else 5 ones and a single . in 6 consecutive positions.

 

data have;
input patient $ 1 @3 month monthcnt;
datalines;
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
A 6 6
B 1 1
B 3 2
B 4 3
B 5 4
B 6 5
C 7 1
C 8 2
C 9 3
D 1 1
D 4 2
D 5 3
D 6 4
D 7 5
D 8 6
;
data want (drop=_:);
  array _mnths {20} ;
  do until (last.patient);
    set have ;
    by patient;
    _mnths{month}=1;
  end;
  _strng=cats(of _mnths{*});
  do _srch='11111','1.1111','11.111','111.11','1111.1' until (_f^=0);
   _f=find(_strng,_srch);
  end;

  do until (last.patient);  /* Reread and (optionally) output this patient*/
    set have;
    by patient;
    if _f^=0 then output;
  end;
run;

And if you are worried about indexing multiple months over consecutive years, then just declare a 2-way array, as in:

 

  array mnths {2014:2016,1:12} ;

Then you can populate the array via statements like: 

  mnths{year,month}=1;

The CATS function will concatenate the 1's and .'s in chronological order (assuming the major index is year and the minor index is month), so the rest of the code works unmodified. 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
aaronh
Quartz | Level 8
Interestingly, one of my colleagues suggested me something that's essentially what you've put down here. It may be a bit challenging to search using an array directly, but if we concatenate all of the indicator variables into one variable, and then use the find() function, it becomes way less complicated. We just need to take note to exhaust all of the possible combinations of the '1111.1' etc etc. Thank you.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2518 views
  • 5 likes
  • 4 in conversation