I have a dataset containing Medicaid claims data, and I want to subset the data to include only individuals who have at least 11 months of continuous enrollment (with at least 15 days of enrollment per each month). The problem is, I have three year's worth of data (2005-2008), so an individual may have only a few months of enrollment in each year they were seen, but it may or may not total the number of months of continuous enrollment that I desire. For example, patient #1001 has claims for 2005 and 2006. Her enrollment began in September 2005 and continued through December 2005, giving her 4 months of continuous enrollment in 2005. Her enrollment then continued from January 2006 to July 2006, giving her 7 months of continuous enrollment in 2006, for a total of 11 months of continuous enrollment. Therefore, I would want to include this person in my dataset. Another example: patient #1003 has claims for 2007 and 2008. He had enrollment in 2007 starting in the month of May through December, giving him a total of 8 months of enrollment in 2007. However, in 2008 he only had one month of enrollment in January, giving him a grand total of 9 months of enrollment. Therefore, I do not want to include patient #1003 in my dataset. What I need is help with is determining the number of months of continuous enrollment, having one number per ID. The variables I have are ID, days_enrollment1-days_enrollment12 (these are number of days of enrollment for January[1]-December[12]), eligible_months (this is # of months eligible for Medicaid per year), and year (year of enrollment, 2005-2008). Again, an individual may have claims filed for multiple years, but I need to find a way to sum up the number of months of continuous enrollment. I feel like the solution to this problem is either really difficult or really easy. Please advise. Thank you in advance.
You may need to clarify you problem a bit more. You say you have 3 years of data and are looking at 11 continuous months as a target. Suppose id XXXX has a first record point of 1/1/2005 and is continuously enrolled through 11/30/2005 for an obvious 11 months. Then has a gap and is enrolled from 3/1/2006 to 1/31/2007 for another 11 months continuous. Do both periods count? Only one period? If so which one?
And for a stickier bit for your definition of continuous: Suppose we have 9 months in a row of 30 or 31 days of enrollment (or a full month just to make it easy) with a following month of 20 days enrollment and the month after that of 18 days enrollment. Is the month with 18 days counted as continuous with the previous month as it could well be a break of up to 26/27/28 days.
A first step would be to create an appropriate SAS date variable representing, most likely the first day of enrollment per month and one record per period (which makes the cross year boundaries a lot easier).
Something like:
ID FirstDayOfMonth EnrolledDays
1001 3/1/2005 22
1001 4/1/2005 30
1001 5/1/2005 20
1001 6/1/2005 0
With this structure you have a number of tools available such as First. and Last processing that allow some of what you're looking for. This is sometimes referred to as wide-to-long transposition.
Data trans (keep id FirstDayOfMonth EnrolledDays eligible_months);
set have;
array e days_enrollment1 - days_enrollment12;
do j=1 to 12;
EnrolledDays=e
FirstDayOfMonth= mdy(j,1,year);
output;
end;
format FirstDayOfMonth mmddyy10.;
run;
The next step requires data in ID and date order so it may be necessary to sort the data.
proc sort data=trans; by id FirstDayOfMonth;run;
Data want;
set trans;
by ID;
Retain months 0;
if first.id then months=0; /* reset counter for first record of each id*/
if EnrolledDays ge 15 then months +1; /* increment the counter to include current month*/
/* NOTE: This counts as continuous the case I asked about with the possible 20 day gap*/
Else If EnrolledDays < 15 then months=0; /* current month doesn't count for incrementing*/
run;
Depending on what you are doing you can get the max or any number of months associated with any ID.
Thank you for your response. I actually want to include patients who had continuous enrollment (11 or more months, with at least 15 days within each month) within the first year of life. So below are examples of patients whom I do and do not want in the final cohort (all values, including DOB, are made-up):
Do want:
ID Year DOB Eligible_months
1001 2007 01/01/2007 12
This person had 12 months of continuous enrollment in the first year of their life.
Another version of the patient I do want:
ID Year DOB Eligible_months
1002 2006 11/1/2006 2
1002 2007 11/1/2006 10
This person had continuous enrollment (starting in Nov and Dec 2006 and continuing on for the first 10 months in 2007) for the entire first year of their life.
Don't want:
ID Year DOB Eligible_months
1003 2005 02/2/2005 3
1003 2007 02/2/2005 1
1003 2009 02/2/2005 8
In this case, the person may have had a total of 12 months of enrollment, but only had 3 months of eligibility in the first year of life, 1 month in 2007, and 8 months in 2009, with having no more than 7 months of continuous enrollment.
I did what you outlined above, and followed that by proc summary to get the maximum number of months of continuous enrollment. But now I need the maximum number of months of continuous enrollment only within the first year of life.
Proc summary nway;
where year(dob) = year;
Class id;
will get the summary by id and each calendar year if you keep the YEAR and DOB variable in the example code.
Okay so this didn't work for me, so I passed off this part of the project to an actual biostatistician. However, I do have another question that is probably easier to answer (and for me to do). In the same dataset, I have something like the following:
ID Date_of_Service Procedure_Code
1001 4/17/05 5661
1001 2/16/05 5661
1001 5/03/07 5661
1001 2/17/05 7521
1001 3/15/06 7521
1001 2/18/05 4556
1001 9/19/08 4556
I want to keep all three of the codes (5661, 7521, and 4556), but I only want to keep the one with the earliest date of service (i.e. for code 5661, I want the one from 2/16/05, from 7521 the one from 2/17/05, and for 4556, the one from 2/18/05). All the other dates can be ignored (for now) as I want the earliest date someone had a particular procedure done. How do I do this? Thank you.
proc summary data=have nway;
class ID Procedure_code;
var Date_of_Service;
output out=want (drop=_type_ _freq_) min=;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.