Hello! I am working with a claims dataset across multiple years, which includes an enrollment file. There is one binary variable indicating enrollment in a certain plan (ENROLLFLAG), with multiple rows per person per record each year. In an ideal scenario, these multiple records correspond to all 12 months. However, some individuals may only have some of the months available, so the data is not set up from 1-12 rows for each person. There is a start and end date (STARTDATE, ENDDATE) associated with each record. How can I generate a dummy variable corresponding to the month associated with each record (FLAGMONTHLY) so that I am able to create a long dataset in this way:
ENROLID ENROLLFLAG STARTDATE ENDDATE FLAGMONTHLY
1234 1 March 1 2011 March 31 2011 3
1234 1 April 1 2011 April 30 2011 4
1234 1 May 1 2011 May 31 2011 5
1234 1 June 1 2011 June 30 2011 6
1234 1 July 1 2011 July 31 2011 7
1234 1 August 1 2011 August 31 2011 8
1234 1 Sep 1 2011 Sep 30 2011 9
1234 1 Oct 1 2011 Oct 31 2011 10
1234 1 Nov 1 2011 Nov 30 2011 11
1234 1 Dec 1 2011 Dec 31 2011 12
1235 1 Jan 1 2011 Jan 31 2011 1
1235 1 Feb 1 2011 Feb 30 2011 2
1235 1 March 1 2011 March 31 2011 3
1235 1 April 1 2011 April 30 2011 4
1235 1 May 1 2011 May 31 2011 5
1235 1 June 1 2011 June 30 2011 6
1235 1 July 1 2011 July 31 2011 7
1235 1 August 1 2011 August 31 2011 8
1235 1 Sep 1 2011 Sep 30 2011 9
1235 1 Oct 1 2011 Oct 31 2011 10
1235 1 Nov 1 2011 Nov 30 2011 11
1235 1 Dec 1 2011 Dec 31 2011 12
Thank you very much!!
If StartDate and EndDate are always in the same month, choose one of the variables and use pass it to the month-function - assuming that those dates are dates and not strings.
Example:
FLAGMONTHLY = month(StartDate);
Hi user22,
Although you have shown the sort of dataset you want to build, it is not terribly clear where you are starting from.
Does your existing dataset look something like this ?
In my example data (WORK.HAVE) above, an EnrollID exists in a single row and shows a StartDate and EndDate. In my data, the StartDate/EndDate are not nicely arranged to be the first and last dates of their corresponding months. The first row is a nice example, but by chance, other rows are just simply start and end dates that a customer may choose for enrollment in a plan, they are not nicely aligned to a month start/end date
.
I have assumed from your example that your FlagMonthly column simply an integer representing the month number for a given row.
If the above dataset is what you have and you would like something like the following, expanded version, please read on.
As you can see, the original four rows in WORK.HAVE have been expanded into WORK.WANT. Each block of rows represents an original row in WORK.HAVE and the number of rows displayed in WORK.WANT corresponds to a month in the range of dates given by StartDate and End Date.
In my example, I have made a rule that if an original record impinges at all on a given month, then there will be a row for that whole month. That is why EnrollID=1461 ends up with two rows covering the whole period 01Mar2011 to 30Apr2011, even though its original EndDate was 15Apr2011.This may not be suitable for your needs, so you will need to amend the code below.....or ask some more questions......😺
If you always wanted a record to be expanded to run through to 31Dec<yyyy> that is pretty easy to achieve, but more questions are going to be asked of you about your data, be prepared !
The whole code that I used is as follows:
********************************************************************************************************************************;
*Create some dummy data that hopefully represents the posters data that they have to start with. ;
********************************************************************************************************************************;
Data HAVE;
Infile DATALINES
;
Input EnrollID
EnrollFlag
StartDate Date9.
+1 EndDate Date9.
;
Format StartDate Date9.
EndDate Date9.
;
DATALINES;
1460 1 01Jan2011 31Dec2011
1461 1 01Mar2011 15Apr2011
1462 1 08Jan2011 28Feb2011
1463 1 23Apr2011 28Nov2011
;
Run;
********************************************************************************************************************************;
*Expand the original dataset. ;
*Create one row for each month contained within the range bound by StartDate and EndDate. ;
********************************************************************************************************************************;
Data WANT;
Set HAVE;
**************************************************************;
*How many month-end boundaries between the Start/End Date for ;
*this EnrollID ? ;
**************************************************************;
NumberOfMonths=IntCk("Month", StartDate, EndDate)+1;
**************************************************************;
*Loop over that number of months and generate a MonthStart and;
*a MonthEnd date for each month in the range. ;
*Output one new row into the WANT dataset for each month in ;
*the range for the EnrollID. Also determine the month-of-year ;
*number (Jan=1, Feb=2, Mar=3) for each generated row. ;
**************************************************************;
Do Count=1 To NumberOfMonths;
MonthStart=IntNx("Month", StartDate, Count-1, "B");
MonthEnd =IntNx("Month", StartDate, Count-1, "E");
FlagMonthly=Month(MonthStart);
Output;
End;
**************************************************************;
*Make our date variables easy to read. ;
**************************************************************;
Format MonthStart Date9.
MonthEnd Date9.
;
**************************************************************;
*Drop the columns we no longer require to keep things tidy. ;
**************************************************************;
Drop Count
NumberOfMonths
StartDate
EndDate
;
Run;
I hope that this helps, and if not, come back to this post with a bit more detail as to what you existing data looks like.
Cheers,
Downunder Dave.
Hi Dave,
Thank you VERY much for your response. I really appreciate the code you have used to generate the output. The original data is set up like so:
I realized after seeing your response, I would like to tweak the output slightly, to also add FLAGMONTHLY for months that we do not have a start and end date available. For example:
I did need a flag per month and folks overall only have 1 record per month (maybeee some exceptions may occur). I hope this offers more detail, please let me know if there are any specific questions I can answer. Ultimately, my goal is to generate a string of 0s and 1s so that I can look pre- and post-30 days coverage from a specific date.
Thank you very much!!!
Hi User22.
I think this next version may be closer to where you need to be, I hope so anyway.
Please have a read through the code and the associated comments and see whether the results you get from this version work for you or not.
As is often the way with these "programming by remote control" posts, I have slowed this down and probably used more steps and statements than someone else might do. Others may criticise in terms of finesse, but I use this way to, hopefully, make the method that I am using, clearer. If it works, and you want to spend some time making it shorter, knock yourself out.
I have added an optional statement in the final DATA step to set those rows with a NULL/MISSING value for EnrollFlag to appear instead as -99999. I have done this to make is very obvious that this data may need attention. You can comment out this IF/THEN statement if you wish or choose a different value in place of -99999.
I have also added a single original record for EnrollID=1236 to simulate what happens when an EnrollID only appears in your original data in one row. I may have got the idea about your original data wrong, so beware.
There is a limitation in this version of the code that may not worry you at all or at this point. That limitation is that it assumes all of your data will be for the calendar year 2011. Should/when your needs expand beyond data for a single year, all is not lost, just a little more complexity is added to the code which I do not think it worth muddying the waters at this point.
********************************************************************************************************************************;
*Create some dummy data. ;
********************************************************************************************************************************;
Data WORK.HAVE;
Infile DATALINES
;
Input EnrollID
EnrollFlag
StartDate Date9.
+1 EndDate Date9.
;
Format StartDate Date9.
EndDate Date9.
;
DATALINES;
1234 1 01Mar2011 31Mar2011
1234 1 01Apr2011 30Apr2011
1234 1 01May2011 31May2011
1234 1 01Jun2011 30Jun2011
1234 1 01Jul2011 31Jul2011
1234 1 01Aug2011 31Aug2011
1234 1 01Sep2011 30Sep2011
1234 1 01Oct2011 31Oct2011
1234 1 01Nov2011 30Nov2011
1234 0 01Dec2011 31Dec2011
1235 1 01Jan2011 31Jan2011
1235 1 01Feb2011 28Feb2011
1235 0 01Mar2011 31Mar2011
1235 1 01Apr2011 30Apr2011
1235 1 01May2011 31May2011
1235 1 01Jun2011 30Jun2011
1235 1 01Jul2011 31Jul2011
1235 . . .
1235 1 01Sep2011 30Sep2011
1235 1 01Oct2011 31Oct2011
1235 1 01Nov2011 30Nov2011
1235 1 01Dec2011 31Dec2011
1236 1 01Apr2011 30Apr2011
;
Run;
********************************************************************************************************************************;
*Create a simple look-up list of distinct EnrollIDs in the current data. ;
********************************************************************************************************************************;
Proc SQL;
Create Table EnrollIDs As
Select EnrollID,
Count(*) As NumberOfRows Format=Comma12. Label="Original Number of Rows fir EnrollID"
From WORK.HAVE
Group By EnrollID
;
Quit;
********************************************************************************************************************************;
*Create an empty frame of 12 months, for each known EnrollID, assuming we are only interested in the year 2011. ;
********************************************************************************************************************************;
Data WORK.Empty_Frame;
Set EnrollIDs;
**************************************************************;
*ASSUMPTION: This data only encompasses the year 2011. ;
**************************************************************;
InitialDate="01Jan2011"d;
Do Count=1 To 12;
StartDate=IntNx("Month", InitialDate, Count-1, "B");
EndDate =IntNx("Month", InitialDate, Count-1, "E");
FlagMonthly=Month(StartDate);
Output;
End;
Format StartDate Date9.
EndDate Date9.
;
Drop Count
InitialDate
;
Run;
********************************************************************************************************************************;
*Sort our two datasets ready for merging/joining. ;
********************************************************************************************************************************;
Proc Sort Data=WORK.HAVE;
By EnrollID
StartDate
;
Run;
Proc Sort Data=WORK.Empty_Frame;
By EnrollID
StartDate
;
Run;
********************************************************************************************************************************;
*Use the WORK.Empty_Frame dataset as our base and merge/join on any corresponding rows from WORK.HAVE. ;
********************************************************************************************************************************;
Data All_Together;
Informat EnrollID EnrollFlag;
Merge WORK.Empty_Frame(In=Empty_Frame)
WORK.HAVE(In=HAVE)
;
By EnrollID
StartDate
;
**********************************************************************************;
*Continue the pass of the DATA step if a row at least appears in WORK.EMPTY_FRAME ;
**********************************************************************************;
If Empty_Frame;
**************************************************************;
*OPTIONAL BIT: ;
*If the current row is one of our filler-rows or it started as;
*as an empty row in the source dataset, make it obvious. ;
**************************************************************;
If EnrollFlag=. Then EnrollFlag=-99999;
Keep EnrollID
EnrollFlag
StartDate
EndDate
FlagMonthly
;
Run;
********************************************************************************************************************************;
*Make sure that the rows are in a readable order again....they ought to be, but........ ;
********************************************************************************************************************************;
Proc Sort Data=All_Together;
By EnrollID
StartDate
;
Run;
Anyhow, good luck with this latest version, I hope that it moves you on from the previous version.
If you post data or code in future, you should try to use the little SAS Code icon on the tool bar on the edit window, highlighted in red below. Using that tool makes the code you paste in use a better and more readable type of formatting:
Keep at it, you will get there !
Cheers,
Downunder Dave
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.