BookmarkSubscribeRSS Feed
user22
Calcite | Level 5

Hello! I am working with a claims dataset across multiple years, which includes an enrollment file. There is one binary variable indicating enrollment in a certain plan (ENROLLFLAG), with multiple rows per person per record each year. In an ideal scenario, these multiple records correspond to all 12 months. However, some individuals may only have some of the months available, so the data is not set up from 1-12 rows for each person. There is a start and end date (STARTDATE, ENDDATE) associated with each record. How can I generate a dummy variable corresponding to the month associated with each record (FLAGMONTHLY) so that I am able to create a long dataset in this way:

 

ENROLID  ENROLLFLAG      STARTDATE       ENDDATE       FLAGMONTHLY

1234          1                           March 1 2011     March 31 2011     3

1234          1                           April 1 2011        April 30 2011        4

1234          1                           May 1 2011        May 31 2011        5

1234          1                           June 1 2011       June 30 2011       6

1234          1                           July 1 2011        July 31 2011         7

1234          1                           August 1 2011   August 31 2011     8

1234          1                           Sep 1 2011      Sep 30 2011           9

1234          1                           Oct 1 2011      Oct 31 2011             10

1234          1                           Nov 1 2011     Nov 30 2011             11

1234          1                           Dec 1 2011     Dec 31 2011             12

1235          1                           Jan 1 2011      Jan 31 2011             1

1235          1                           Feb 1 2011        Feb 30 2011          2

1235          1                           March 1 2011     March 31 2011     3

1235          1                           April 1 2011        April 30 2011        4

1235          1                           May 1 2011        May 31 2011        5

1235         1                           June 1 2011       June 30 2011       6

1235          1                           July 1 2011        July 31 2011         7

1235          1                           August 1 2011   August 31 2011     8

1235          1                           Sep 1 2011      Sep 30 2011           9

1235          1                           Oct 1 2011    Oct 31 2011               10

1235          1                           Nov 1 2011   Nov 30 2011               11

1235          1                           Dec 1 2011   Dec 31 2011               12

 

Thank you very much!!

 

4 REPLIES 4
andreas_lds
Jade | Level 19

If StartDate and EndDate are always in the same month, choose one of the variables and use pass it to the month-function - assuming that those dates are dates and not strings.

Example:

FLAGMONTHLY = month(StartDate);

DaveShea
Lapis Lazuli | Level 10

Hi user22,

Although you have shown the sort of dataset you want to build, it is not terribly clear where you are starting from.

Does your existing dataset look something like this ?

2022-02-21_22h11_25.png

In my example data (WORK.HAVE) above, an EnrollID exists in a single row and shows a StartDate and EndDate. In my data, the StartDate/EndDate are not nicely arranged to be the first and last dates of their corresponding months. The first row is a nice example, but by chance, other rows are just simply start and end dates that a customer may choose for enrollment in a plan, they are not nicely aligned to a month start/end date

.

I have assumed from your example that your FlagMonthly column simply an integer representing the month number for a given row.

 

If the above dataset is what you have and you would like something like the following, expanded version, please read on.

2022-02-21_22h20_20.png

As you can see, the original four rows in WORK.HAVE have been expanded into WORK.WANT. Each block of rows represents an original row in WORK.HAVE and the number of rows displayed in WORK.WANT corresponds to a month in the range of dates given by StartDate and End Date.

 

In my example, I have made a rule that if an original record impinges at all on a given month, then there will be a row for that whole month. That is why EnrollID=1461 ends up with two rows covering the whole period 01Mar2011 to 30Apr2011, even though its original EndDate was 15Apr2011.This may not be suitable for your needs, so you will need to amend the code below.....or ask some more questions......😺

If you always wanted a record to be expanded to run through to 31Dec<yyyy> that is pretty easy to achieve, but more questions are going to be asked of you about your data, be prepared !

 

The whole code that I used is as follows:

********************************************************************************************************************************;
*Create some dummy data that hopefully represents the posters data that they have to start with.                                ;
********************************************************************************************************************************;
Data HAVE;
 Infile DATALINES
        ;
Input   EnrollID
        EnrollFlag
        StartDate   Date9.
    +1  EndDate     Date9.
        ;

Format  StartDate   Date9.
        EndDate     Date9.
        ;

DATALINES;
1460 1 01Jan2011 31Dec2011
1461 1 01Mar2011 15Apr2011
1462 1 08Jan2011 28Feb2011
1463 1 23Apr2011 28Nov2011
;
Run;

********************************************************************************************************************************;
*Expand the original dataset.                                                                                                   ;
*Create one row for each month contained within the range bound by StartDate and EndDate.                                       ;
********************************************************************************************************************************;
Data WANT;
 Set HAVE;

 **************************************************************;
 *How many month-end boundaries between the Start/End Date for ;
 *this EnrollID ?                                              ;
 **************************************************************;
 NumberOfMonths=IntCk("Month", StartDate, EndDate)+1;

 **************************************************************;
 *Loop over that number of months and generate a MonthStart and;
 *a MonthEnd date for each month in the range.                 ;
 *Output one new row into the WANT dataset for each month in   ;
 *the range for the EnrollID. Also determine the month-of-year ;
 *number (Jan=1, Feb=2, Mar=3) for each generated row.         ;
 **************************************************************;
 Do Count=1 To NumberOfMonths;
    MonthStart=IntNx("Month", StartDate, Count-1, "B");
    MonthEnd  =IntNx("Month", StartDate, Count-1, "E");
    FlagMonthly=Month(MonthStart);
    Output;
 End;

**************************************************************;
*Make our date variables easy to read.                        ;
**************************************************************;
Format  MonthStart  Date9.
        MonthEnd    Date9.
        ;
**************************************************************;
*Drop the columns we no longer require to keep things tidy.   ;
**************************************************************;
Drop    Count
        NumberOfMonths
        StartDate
        EndDate
        ;

Run;

I hope that this helps, and if not, come back to this post with a bit more detail as to what you existing data looks like.

 

Cheers,

 

Downunder Dave.

 

 

user22
Calcite | Level 5

Hi Dave,

Thank you VERY much for your response. I really appreciate the code you have used to generate the output. The original data is set up like so:

ENROLID  ENROLLFLAG      STARTDATE       ENDDATE       
1234          1                           March 1 2011     March 31 2011     
1234          1                           April 1 2011        April 30 2011        
1234          1                           May 1 2011        May 31 2011        
1234          1                           June 1 2011       June 30 2011       
1234          1                           July 1 2011        July 31 2011         
1234          1                           August 1 2011   August 31 2011     
1234          1                           Sep 1 2011      Sep 30 2011           
1234          1                           Oct 1 2011      Oct 31 2011             
1234          1                           Nov 1 2011     Nov 30 2011            
1234          0                           Dec 1 2011     Dec 31 2011            
1235          1                           Jan 1 2011      Jan 31 2011            
1235          1                           Feb 1 2011        Feb 28 2011         
1235          0                           March 1 2011     March 31 2011     
1235          1                           April 1 2011        April 30 2011        
1235          1                           May 1 2011        May 31 2011        
1235          1                           June 1 2011       June 30 2011       
1235          1                           July 1 2011        July 31 2011         
1235          -                               -                            -
1235          1                           Sep 1 2011      Sep 30 2011           
1235          1                           Oct 1 2011    Oct 31 2011               
1235          1                           Nov 1 2011   Nov 30 2011               
1235          1                           Dec 1 2011   Dec 31 2011               

 

I realized after seeing your response, I would like to tweak the output slightly, to also add FLAGMONTHLY for months that we do not have a start and end date available. For example:

 

ENROLID  ENROLLFLAG      STARTDATE       ENDDATE       FLAGMONTHLY
1234          0                                    -                                 -           1
1234          0                                    -                                 -           2
1234          1                           March 1 2011     March 31 2011     3
1234          1                           April 1 2011        April 30 2011        4
1234          1                           May 1 2011        May 31 2011        5
1234          1                           June 1 2011       June 30 2011       6
1234          1                           July 1 2011        July 31 2011         7
1234          1                           August 1 2011   August 31 2011     8
1234          1                           Sep 1 2011      Sep 30 2011           9
1234          1                           Oct 1 2011      Oct 31 2011             10
1234          1                           Nov 1 2011     Nov 30 2011             11
1234          0                           Dec 1 2011     Dec 31 2011             12
1235          1                           Jan 1 2011      Jan 31 2011             1
1235          1                           Feb 1 2011        Feb 28 2011          2
1235          0                           March 1 2011     March 31 2011     3
1235          1                           April 1 2011        April 30 2011        4
1235          1                           May 1 2011        May 31 2011        5
1235         1                           June 1 2011       June 30 2011        6
1235          1                           July 1 2011        July 31 2011         7
1235          0                                  -                           -                    8
1235          1                           Sep 1 2011      Sep 30 2011           9
1235          1                           Oct 1 2011    Oct 31 2011               10
1235          1                           Nov 1 2011   Nov 30 2011               11
1235          1                           Dec 1 2011   Dec 31 2011               12

 

I did need a flag per month and folks overall only have 1 record per month (maybeee some exceptions may occur). I hope this offers more detail, please let me know if there are any specific questions I can answer. Ultimately, my goal is to generate a string of 0s and 1s so that I can look pre- and post-30 days coverage from a specific date.

 

Thank you very much!!!

DaveShea
Lapis Lazuli | Level 10

Hi User22.

I think this next version may be closer to where you need to be, I hope so anyway.

 

Please have a read through the code and the associated comments and see whether the results you get from this version work for you or not.

 

As is often the way with these "programming by remote control" posts, I have slowed this down and probably used more steps and statements than someone else might do. Others may criticise in terms of finesse, but I use this way to, hopefully, make the method that I am using, clearer. If it works, and you want to spend some time making it shorter, knock yourself out.

 

I have added an optional statement in the final DATA step to set those rows with a NULL/MISSING value for EnrollFlag to appear instead as -99999. I have done this to make is very obvious that this data may need attention. You can comment out this IF/THEN statement if you wish or choose a different value in place of -99999.

 

I have also added a single original record for EnrollID=1236 to simulate what happens when an EnrollID only appears in your original data in one row. I may have got the idea about your original data wrong, so beware.

 

There is a limitation in this version of the code that may not worry you at all or at this point. That limitation is that it assumes all of your data will be for the calendar year 2011. Should/when your needs expand beyond data for a single year, all is not lost, just a little more complexity is added to the code which I do not think it worth muddying the waters at this point.

 

********************************************************************************************************************************;
*Create some dummy data.                                                                                                        ;
********************************************************************************************************************************;
Data WORK.HAVE;
 Infile DATALINES
        ;
Input   EnrollID
        EnrollFlag
        StartDate   Date9.
    +1  EndDate     Date9.
        ;

Format  StartDate   Date9.
        EndDate     Date9.
        ;

DATALINES;
1234 1 01Mar2011 31Mar2011
1234 1 01Apr2011 30Apr2011
1234 1 01May2011 31May2011
1234 1 01Jun2011 30Jun2011
1234 1 01Jul2011 31Jul2011
1234 1 01Aug2011 31Aug2011
1234 1 01Sep2011 30Sep2011
1234 1 01Oct2011 31Oct2011
1234 1 01Nov2011 30Nov2011
1234 0 01Dec2011 31Dec2011
1235 1 01Jan2011 31Jan2011
1235 1 01Feb2011 28Feb2011
1235 0 01Mar2011 31Mar2011
1235 1 01Apr2011 30Apr2011
1235 1 01May2011 31May2011
1235 1 01Jun2011 30Jun2011
1235 1 01Jul2011 31Jul2011
1235 . .         .        
1235 1 01Sep2011 30Sep2011
1235 1 01Oct2011 31Oct2011
1235 1 01Nov2011 30Nov2011
1235 1 01Dec2011 31Dec2011
1236 1 01Apr2011 30Apr2011
;
Run;

********************************************************************************************************************************;
*Create a simple look-up list of distinct EnrollIDs in the current data.                                                        ;
********************************************************************************************************************************;
Proc SQL;
 Create Table EnrollIDs As
 Select EnrollID,
        Count(*)	As	NumberOfRows	Format=Comma12. Label="Original Number of Rows fir EnrollID"
 From   WORK.HAVE
 Group  By EnrollID
;

Quit;

********************************************************************************************************************************;
*Create an empty frame of 12 months, for each known EnrollID, assuming we are only interested in the year 2011.                 ;
********************************************************************************************************************************;
Data WORK.Empty_Frame;
 Set EnrollIDs;

 **************************************************************;
 *ASSUMPTION: This data only encompasses the year 2011.        ;
 **************************************************************;
 InitialDate="01Jan2011"d;

  Do Count=1 To 12;
    StartDate=IntNx("Month", InitialDate, Count-1, "B");
    EndDate  =IntNx("Month", InitialDate, Count-1, "E");
    FlagMonthly=Month(StartDate);
    Output;
 End;

 Format StartDate   Date9.
        EndDate     Date9.
        ;

 Drop   Count
        InitialDate
        ;
    
Run;

********************************************************************************************************************************;
*Sort our two datasets ready for merging/joining.                                                                               ;
********************************************************************************************************************************;
Proc Sort Data=WORK.HAVE;
 By EnrollID
    StartDate
    ;
Run;

Proc Sort Data=WORK.Empty_Frame;
 By EnrollID
    StartDate
    ;
Run;

********************************************************************************************************************************;
*Use the WORK.Empty_Frame dataset as our base and merge/join on any corresponding rows from WORK.HAVE.                          ;
********************************************************************************************************************************;
Data All_Together;
 Informat EnrollID EnrollFlag;
 Merge  WORK.Empty_Frame(In=Empty_Frame)
        WORK.HAVE(In=HAVE)
        ;
 By EnrollID
    StartDate
    ;
 **********************************************************************************;
 *Continue the pass of the DATA step if a row at least appears in WORK.EMPTY_FRAME ;
 **********************************************************************************;
 If Empty_Frame;

 **************************************************************;
 *OPTIONAL BIT:                                                ;
 *If the current row is one of our filler-rows or it started as;
 *as an empty row in the source dataset, make it obvious.      ;
 **************************************************************;
 If EnrollFlag=. Then EnrollFlag=-99999;

 Keep   EnrollID
        EnrollFlag
        StartDate
        EndDate
        FlagMonthly
        ; 

Run;

********************************************************************************************************************************;
*Make sure that the rows are in a readable order again....they ought to be, but........                                                                               ;
********************************************************************************************************************************;
Proc Sort Data=All_Together;
 By EnrollID
    StartDate
    ;
Run;

Anyhow, good luck with this latest version, I hope that it moves you on from the previous version.

 

If you post data or code in future, you should try to use the little SAS Code icon on the tool bar on the edit window, highlighted in red below. Using that tool makes the code you paste in use a better and more readable type of formatting:

DaveShea_0-1645517271902.png

 

Keep at it, you will get there !

 

Cheers,

 

Downunder Dave

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 428 views
  • 0 likes
  • 3 in conversation