BookmarkSubscribeRSS Feed
Banke
Pyrite | Level 9

Hello all, I am moving to the new phase of my analysis. I want to identify patients continuously enrolled 6 months before (start date) and 6 months after (end date) their first prescription (var index_date)

Hi! I am working with claims based data and I'm having difficulty selecting the cases I want based on their enrollment period.

The insurance enrollment file (enrolled) is arranged as 36 columns representing one month from Jan/1/2013 to Dec/31/2016.  It has values of  A = medical, B = pharmacy, C = both, and D = none for enrollment during that month. I need patients continuously enrolled in both (C). The index date, start, and end date are in the format mmddyy10. 

I saw a similar question on this forum

%LET ENROLLED_1 = ENROLLED_01_2014 - ENROLLED_01_2016; %LET ENROLLED_7 = ENROLLED_07_2014 - ENROLLED_07_2016;
%LET ENROLLED_2 = ENROLLED_02_2014 - ENROLLED_02_2016; %LET ENROLLED_8 = ENROLLED_08_2014 - ENROLLED_08_2016;
%LET ENROLLED_3 = ENROLLED_03_2014 - ENROLLED_03_2016; %LET ENROLLED_9 = ENROLLED_09_2014 - ENROLLED_09_2016;
%LET ENROLLED_4 = ENROLLED_04_2014 - ENROLLED_04_2016; %LET ENROLLED_10 = ENROLLED_10_2014 - ENROLLED_10_2016;
%LET ENROLLED_5 = ENROLLED_05_2014 - ENROLLED_05_2016; %LET ENROLLED_11 = ENROLLED_11_2014 - ENROLLED_11_2016;
%LET ENROLLED_6 = ENROLLED_06_2014 - ENROLLED_06_2016; %LET ENROLLED_12 = ENROLLED_12_2014 - ENROLLED_12_2016;

DATA CONTINUOUS;
SET INDEX_DATE;
ARRAY STATUS (*) $ &ENROLLED_1. &ENROLLED_2. &ENROLLED_3. &ENROLLED_4. &ENROLLED_5. &ENROLLED_6.
                    &ENROLLED_7. &ENROLLED_8. &ENROLLED_9. &ENROLLED_10. &ENROLLED_11. &ENROLLED_12.;				
  /**_START_DATE=INTCK('DAY','31DEC2013'd,FILL_DATE) - 180;**/
 /** _END_DATE=INTCK('DAY','31DEC2013'd,FILL_DATE) + 180;**/

CONTINUOUS = 'NO';

DO _M = START TO END_1 UNTIL (CONTINUOUS = 'YES');
  IF STATUS{_M}^= 'B' THEN CONTINUOUS = 'YES';
  END;
 RUN;

and tried to adapt the code but  i get errors saying "array subscript out of range at line 166 column 6".

the column 6 is the code IF STATUS{_M}^= 'B' THEN CONTINUOUS = 'YES';.

I though it was my macro, but spelling them out gives the same errors.

 

Thank you for your assistance in advance

 

16 REPLIES 16
PaigeMiller
Diamond | Level 26

Since there are error messages, please show us the ENTIRE log for this data step.

--
Paige Miller
Banke
Pyrite | Level 9

its the same one i posted, i have posted again

**my code**/
%LET ENROLLED_1 = ENROLLED_01_2014 - ENROLLED_01_2016; %LET ENROLLED_7 = ENROLLED_07_2014 - ENROLLED_07_2016; %LET ENROLLED_2 = ENROLLED_02_2014 - ENROLLED_02_2016; %LET ENROLLED_8 = ENROLLED_08_2014 - ENROLLED_08_2016; %LET ENROLLED_3 = ENROLLED_03_2014 - ENROLLED_03_2016; %LET ENROLLED_9 = ENROLLED_09_2014 - ENROLLED_09_2016; %LET ENROLLED_4 = ENROLLED_04_2014 - ENROLLED_04_2016; %LET ENROLLED_10 = ENROLLED_10_2014 - ENROLLED_10_2016; %LET ENROLLED_5 = ENROLLED_05_2014 - ENROLLED_05_2016; %LET ENROLLED_11 = ENROLLED_11_2014 - ENROLLED_11_2016; %LET ENROLLED_6 = ENROLLED_06_2014 - ENROLLED_06_2016; %LET ENROLLED_12 = ENROLLED_12_2014 - ENROLLED_12_2016; DATA CONTINUOUS; SET INDEX_DATE; ARRAY STATUS (*) $ &ENROLLED_1. &ENROLLED_2. &ENROLLED_3. &ENROLLED_4. &ENROLLED_5. &ENROLLED_6. &ENROLLED_7. &ENROLLED_8. &ENROLLED_9. &ENROLLED_10. &ENROLLED_11. &ENROLLED_12.; /**_START_DATE=INTCK('DAY','31DEC2013'd,FILL_DATE) - 180;**/ /** _END_DATE=INTCK('DAY','31DEC2013'd,FILL_DATE) + 180;**/ CONTINUOUS = 'YES'; DO _M = START TO END_1 UNTIL (CONTINUOUS = 'NO'); IF STATUS{_M}^= 'B' THEN CONTINUOUS = 'NO'; END; RUN;



/**code i tried to adapt**/
data want;
  set have;
  array status {145:300} status145-status300  /* All months Jan 1991 through Dec 2015*/

  _beg_month=intck('month','31dec1990'd,diag_date) - 12;
  _end_month=intck('month','31dec1990'd,death_date);

  continuous='YES';   /* Default */
  
  /* Now search for break in string of 1's */
  do _m=_beg_month  to _end_month until (continuous='NO');
    if status{_m}^=1 then continuous='NO';
  end;
run;
 

 and the one i saw and tried to adapt 

PaigeMiller
Diamond | Level 26

Can we please see the ENTIRE log for the DATA step that contains the error?

--
Paige Miller
Banke
Pyrite | Level 9

Yes, please see attached the latest one where i listed all the enrollment variables. I cant copy the log so log 1.pnglog 2.pngi attached the screenshots. Thank you

PaigeMiller
Diamond | Level 26
do _m=start to end_1 until (continuous='YES');

 

The value of START is the date 07/03/2016, which is actually the integer 20669 . Why? Because SAS represents dates as days since Jan 1, 1960. The format is irrelevant.

 

The value of END_1 is the date 06/28/2017, which is actually the integer 20998. Why? Because SAS represents dates as days since Jan 1, 1960. The format is irrelevant.

 

So your DO loop variable _M increments from 20669 to 20998 by 1. That's a much bigger number than the size of the array. SAS doesn't know you want to increment by MONTHS unless you specifically tell it to do so, which you have not done; and so as it increments by 1, the array index is out of range, bigger than the size of the array.

 

At this point, I think an possible alternative would be something like this:

 

do _m=1 to dim(status);

 

I almost always recommend against the format of data where you have calendar information in the variable name. This creates difficulties (for most people, including obviously you) in handling the variables. A much better layout of the data is a long data set, where the month is an actual date value of a variable (which could have the name MONTH) but anyway, what you are trying to do is so much easier in that case. Even if you are stuck with the data in the present format, there's no reason that you can't re-arrange the data into the long format to make your coding much much much much much much much much much simpler. That's my take on this. For someone else‘s opinion, see Maxim 19.

 

--
Paige Miller
Banke
Pyrite | Level 9
Thank you so much, i dont know how i missed your response. Been waiting..lol. So i used your code and there was no error. I think what SAS did was not look at patients with value "B" in my array. What i really want is to identify patients that had value "B" from their start to end dates i.e. are continouslly enrolled in that period. Do you have an idea of how i can go about it please?
PaigeMiller
Diamond | Level 26

@Banke wrote:
Thank you so much, i dont know how i missed your response. Been waiting..lol. So i used your code and there was no error. I think what SAS did was not look at patients with value "B" in my array. What i really want is to identify patients that had value "B" from their start to end dates i.e. are continouslly enrolled in that period. Do you have an idea of how i can go about it please?

I said convert your wide data set to long. Here's what I said:

 

"I almost always recommend against the format of data where you have calendar information in the variable name. This creates difficulties (for most people, including obviously you) in handling the variables. A much better layout of the data is a long data set, where the month is an actual date value of a variable (which could have the name MONTH) but anyway, what you are trying to do is so much easier in that case. Even if you are stuck with the data in the present format, there's no reason that you can't re-arrange the data into the long format to make your coding much much much much much much much much much simpler. That's my take on this. For someone else‘s opinion, see Maxim 19."

 

That's how I would go. As stated, this is sooooo much simpler than what you are trying to do. Then all of this difficult manipulating the months embedded inside variable names goes away. I have fought with other users so many times on this issue, I'm not going to fight with you. You can choose to go this way (in which case I will be happy to help further), or not. Entirely up to you.

--
Paige Miller
Banke
Pyrite | Level 9
PROC TRANSPOSE DATA = ENROLLMENT1 OUT = ENROLLMENT2; 
BY PATIENT_ID;
VAR ENROLLED_01_2014--ENROLLED_12_2016;
RUN;

DATA ENROLLMENT2(rename=(col1=type)); SET ENROLLMENT2; MONTH = INPUT(SUBSTR(_NAME_, 10,2), 8.); YEAR = INPUT(SUBSTR(_NAME_, 13,4), 8.); DATE = MDY( MONTH,1,YEAR); FORMAT DATE YYMMS7.; DROP _NAME_ MONTH YEAR; RUN;

Thank you, I agree with your thought process. I was able to convert the variable names to date format in the long format using the attached code. 

Tom
Super User Tom
Super User

Assuming that a variable with a name like  ENROLLED_02_2014 means the enrollment status for Feb of 2014, then your ARRAY definition makes absolutely no sense.

 

You jump from FEB2014 to FEB2015 to FEB2016 then back to MAR2014 etc.

 

So assuming you have the variables listed in chronological order you can probably just convert the test for continuous enrollment to simple string comparison.

string = cats(of ENROLLED_01_2014 ENROLLED_02_2014 .... ENROLLED_12_2016);
start = 1+intck('month','01JAN2014'd,fill_date);
if substrn(string,start,6)='BBBBBB' then continuous='YES';
else continuous='NO';

 

 

Banke
Pyrite | Level 9

TYou are absolutely right about the arrangement in my array not making sense. I just observed that. Thank you for the code you shared, i am not familiar with the functions and dont really understand it. Will i use it in a data step?

 

 

ballardw
Super User

What are the values of the variable START and End_1?

Your array has 36 elements. If either of those variables has a value less than 1 or greater than 36 that will cause that specific error message and they are what controls the value of _M.

 

Personally I think that array definition is pretty suspect for many sorts of processing. It appears to be Year and Month oriented but the order of the elements in the array are not in a date order.

Banke
Pyrite | Level 9

START and END_1 are date  formats MMDDYY10. The array elements are character variables with values of either  A,B,C and D. I am interested in "C" because it is the value that shows that the patient have twe two types of insurance i want. I want to select patients that are continously enrolled with that kind of insurance from the START to END_1 date. I had previously calculated STAT_1 and END_1 by subtracting and adding 160 days repsectively from the variable INDEX_date which is the first day of using the insurance.

 

Thank you

Banke
Pyrite | Level 9

My data is like this:

 

data have;

infile datalines ;

informat patient_id $. startdate indexdate enddate mmddyy10.;;

format patient_id $. startdate indexdate enddate mmddyy10.;

input patient_id $. startdate indexdate enddate mon1 mon2 mon3 mon4 mon5 mon6 mon7 mon8 mon9 mon10 mon11 mon12;

datalines;

AA 12/30/2016  07/25/2014 01/21/2015  A B C B C C C C ......

BB 11/29/2016  01/07/2014 07/06/2014  A B C B C B B C ......

CC 08/08/2015  05/10/2015 11/06/2015  A B C B C C C C ......

DD 12/06/2016  10/23/2015 04/20/2016  A C C C C C C C ......

EE 11/27/2016  05/07/2014 11/03/2014  A B C B  C C D  ......

Tom
Super User Tom
Super User

If your data looks like that then how do you end up with the variables you showed in the other post?

 

Also that data does not look right. 

Some of the start dates are after the end dates.

The longest run of B's is only 2 so none of those cases will have 180 days (6 months) of coverage.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 16 replies
  • 872 views
  • 1 like
  • 4 in conversation