BookmarkSubscribeRSS Feed
rox26
Calcite | Level 5

I want to create an indicator for the row of data for each person that is the highest count of consecutive months of insurance BUT that also includes the month of diagnosis in that span. 

 

In an example of one case, I have information about the enrollment status in insurance for the month of diagnosis and the 6 months after. The month of diagnosis is 12/2020 and they are enrolled that month and the next month (1/2021). However, they are not enrolled in insurance the following month (2/2021). They are enrolled again in insurance 3/2021 through 6/2021. I have created the counts of consecutive months of insurance with the 2 consecutive spans. I now need to create an indicator that will show which row (per person) has the highest count of consecutive months of insurance while in the same consecutive span as the diagnosis month.  

 

This one case has a situation where the highest count of consecutive insurance is in a different date span than the diagnosis date. I've tried different combinations of first last but it's not doing the trick. 

 

Here's the code and the screenshots of the tables:

 

*data of one person with 2 different consecutive spans;
data have;
input id dx_date :ddmmyy10. enroll_date :ddmmyy10. insurance consec_months;
format
dx_date ddmmyy10.
enroll_date ddmmyy10.;
datalines;
99 12/01/2020 12/01/2020 1 1
99 12/01/2020 01/01/2021 1 2
99 12/01/2020 03/01/2021 1 1
99 12/01/2020 04/01/2021 1 2
99 12/01/2020 05/01/2021 1 3
99 12/01/2020 06/01/2021 1 4
;
run;
proc print data=have;
run;

 

rox26_1-1722975755186.png

 

*manually showing the output I want for the indicator;
data want;
set have;
if enroll_date='01JAN2021'd and consec_months=2 then want_indicator=1;
else want_indicator=0;
run;
proc print data=want;
run;

 

rox26_2-1722975772332.png

Thank you!

4 REPLIES 4
whymath
Lapis Lazuli | Level 10
So, why you flag out the 2nd row rather than the 6th row?
rox26
Calcite | Level 5

There are two different spans of consecutive enrollment in this one case. One is 12/2020 to 1/2021 and the other is 3/2021 to 6/2021. I need the count of the first span (2nd row) because it contains the month of diagnosis (12/2020). Most of my other cases are continuously consecutive, so I was just taking the last row of each but here is an instance where that doesn't work. 

SASKiwi
PROC Star

Here is an attempt that works for your data. I've corrected your INFORMATs as they didn't seem to match your data:

data have;
input id dx_date :mmddyy10. enroll_date :mmddyy10. insurance consec_months;
format
dx_date ddmmyy10.
enroll_date ddmmyy10.;
datalines;
99 12/01/2020 12/01/2020 1 1
99 12/01/2020 01/01/2021 1 2
99 12/01/2020 03/01/2021 1 1
99 12/01/2020 04/01/2021 1 2
99 12/01/2020 05/01/2021 1 3
99 12/01/2020 06/01/2021 1 4
;
run;

data Want;
  drop last_:;
  set Have;
  by id dx_date enroll_date;
  retain last_enroll_date DX_Flag last_DX_Flag;
  if first.id then do;
    consec_months_derived = 1;
    last_enroll_date = .;
	DX_Flag = '';
  end;
  else do;
    if enroll_date = intnx('MONTH', last_enroll_date, 1) then do;
      consec_months_derived + 1;
	  if enroll_date >= dx_date and last_DX_Flag in ('','Y') then DX_Flag = 'Y';
	end;
	else do;
      consec_months_derived = 1;
	  DX_Flag = 'N';
    end;
  end;
  output;
  last_enroll_date = enroll_date;
  last_DX_Flag = DX_Flag; 
run;

I've also created a Y/N flag rather than your 0/1 flag.

mkeintz
PROC Star

Given your data are

  1. sorted by ID/ENROLL_DATE
  2. each time there is a gap in ENROLL_DATE, consec_months is set to 1, otherwise consec_months is incremented by 1

 

then you can restate the identification of the desired observation.  You want the observation which satisfies

  1. the most recent consec_months=1 record has enroll_date<=dx_date.
  2. the upcoming record
    1. Begins a new ID,  
    2. or
      1. follows a GAP in enroll_date, indicated by a CONSEC_MONTH=1 in the following record
      2. ... and has an enroll_date > dx_date.

 

 

 

data have;
  input id dx_date :ddmmyy10. enroll_date :ddmmyy10. insurance consec_months;
  format dx_date ddmmyy10.  enroll_date ddmmyy10.;
datalines;
99 12/01/2020 12/01/2020 1 1
99 12/01/2020 01/01/2021 1 2
99 12/01/2020 03/01/2021 1 1
99 12/01/2020 04/01/2021 1 2
99 12/01/2020 05/01/2021 1 3
99 12/01/2020 06/01/2021 1 4
;


data want (drop=_:);
  set have (keep=id);
  by id  ;

  merge have
        have (firstobs=2 keep=enroll_date consec_months
              rename=(enroll_date=_nxt_edate consec_months=_nxt_cm));

  if consec_months=1 then _current_start=enroll_date;  
  retain _current_start;  /*Start of current consecutive ENROLL_DATEs */

  if _current_start<=dx_date and (last.id=1  or (_nxt_cm=1 and _nxt_edate>dx_date));
run;

The "trick" here is the use of the self-merge with the "firstobs=2" parameter to retrieve the upcoming enroll_date and consec_months.

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 480 views
  • 0 likes
  • 4 in conversation