BookmarkSubscribeRSS Feed
A_SAS_Man
Pyrite | Level 9

I have somewhat of a complex problem (at least for me). I have monthly data that contains a person id, coverage date, start date and end date in the format below:

 

 data have;
 input person $1. coverage_date:mmddyy10. start_date:mmddyy10. end_date:mmddyy10.;
 format coverage_date date9. start_date date9. end_date date9.;
 datalines;
1 01/01/2019 01/01/2019 01/19/2019
1 02/01/2019 02/02/2019 02/28/2019
1 03/01/2019 01/01/2019 02/28/2019
 ;
 run;

What I'm attempting to do is convert this to a weekly data set, and for each week checking if it is an "active" week. My definition of active is that any day of the specific week fall between the start_date and end_date fields. I've created a partial sample of what I'm looking for from the first few lines for person 1 below.

 

 data partial_want;
 input person $1. week_number 2.0 week_start_date:mmddyy10. Active 1.0;
 format week_start_date date9.;
 datalines;
1 0 01/01/2019 1
1 1 01/06/2019 1
1 2 01/13/2019 1
1 3 01/20/2019 0
1 4 01/27/2019 1
1 5 02/03/2019 1
;
run;

I would like my week start dates to be consistent with how the SAS week function would calculate whether a given day is in week 0,1, etc. I believe the dates I have in my partial_want set would correspond to that.Please let me know if there are clarifications needed on the goal.

 

EDIT: As was pointed out below I had some inconsistency in my problem statement. I mis-stated what my definition of active was, and I have edited it above and am also posting again below here with a clarification of my overall goal. I have also simplified my original data have to increase the clarity here.

 

I am trying to take a monthly data set that contains a coverage_date, start_date and end_date and convert it to a weekly data set with an additional field indicating whether that week has a day in it that falls within the "active" period (between start_date and end_date). The steps to this as I see it are below.

 

1. Convert data set to a weekly one that turns over every year. So there would be weeks 0-52 in a given year (+/- leap year stuff).

2. Check whether for that particular week and person, at least one day of the week falls between their start and end dates. If it does, put a "1" in the active field.

 

If this is still not clear, please feel free to ask more clarifying questions.

12 REPLIES 12
ballardw
Super User

When discussing weeks you need to carefully consider what you define as a week in terms of which day it starts, how intervals crossing year boundaries might be considered and the "week value" you want.

 

So, assuming your Active in the want data a 1 means in an active  week you need to describe why the first record is considered "active". Very explicitly as in days of week considered and other rules. And exactly how the 0, 1, 2 etc values are derived for week_number. Also where does the 01/06/2019 date come from???

 

For example in typical terms the start of the week containing coverage_date 01/01/2019 starts on Sunday 12/30/2018. So I would not expect the week to be active according to your statement of between start and end.

You may find these snippets helpful:

 covstart =intnx('week',coverage_date,0,'B');
 covend   =intnx('week',coverage_date,0,'E');

Covstart and Covend would be the typical start of the calendar week containing coverage_date as date values. If you are adjusting by some offset you could add/ subtract.

 

It is not exactly clear what happens with your coverage_date given the output.

I might guess that you are attempting to do some sort of loop over the values of start_date and end_date for comparison but that isn't clear. And as I said about what exactly how you use coverage_date isn't clear. There is no way that coverage date is in the week starting 01/06/2019. (which could be derived by using Intnx('week',start_date,1,'B') ).

A_SAS_Man
Pyrite | Level 9

Let me provide some clarification as I believe I did make an error in my original example, so I apologize.

 

The week numbers are derived using the SAS week function, so if that function would call  a particular day as being within week 0, 1, 2, etc then that's what I was going off of. I don't necessarily need to have that definition of weeks in the solution, but that was the one I was initially using. Below is the quick check I was doing to see week start and end dates.

 

data test;
input date:mmddyy10.;
format date date9.;
datalines;
01/01/2019
01/02/2019
01/03/2019
01/04/2019
01/05/2019
01/06/2019
01/07/2019
01/08/2019
01/09/2019
01/10/2019
01/11/2019
01/12/2019
01/13/2019
01/14/2019
01/15/2019
01/16/2019
;
run;

proc sql;
create table test2 as
	select *,
		   week(date,'U') as Week
	from test
;
quit;

 

The next part is where I made a mistake which is in my definition of active. You are correct that according to the criteria I specified the first line should not have been active. This is because we were initially considering two different sets of criteria for our project, and I combined the two when creating my example. I am going to edit my original question to reflect this.

 

 

novinosrin
Tourmaline | Level 20

Hi @A_SAS_Man  I am not sure of the Active definition of yours as I am finding some discrepancy with mine, and so I have commented that part. 

With some assumptions, here is an approach. Note, I have computed Week_start and Week_end. In your partial output sample I see there is a week_start, however I am including the week_end for easy debugging or even to know for sure whether or not the approach was seemingly in the right track. Please review and let me know.

 


data have;
 input person $1. coverage_date:mmddyy10. start_date:mmddyy10. end_date:mmddyy10.;
 format coverage_date date9. start_date date9. end_date date9.;
 datalines;
1 01/01/2019 01/01/2019 01/25/2019
1 02/01/2019 02/02/2019 02/28/2019
1 03/01/2019 01/01/2019 02/28/2019
2 04/01/2019 01/01/2019 04/20/2019
2 05/01/2019 01/01/2019 05/28/2019
2 06/01/2019 06/02/2019 06/30/2019
3 01/01/2020 01/01/2019 03/31/2020
3 02/01/2020 01/01/2019 03/31/2020
3 03/01/2020 01/01/2019 03/31/2020
 ;
run;

data want;
 set have;
 by person;
 retain week_start week_end ;
 if first.person then do;
  call missing(week_start,week_end);
  week_start=start_date;
  weeknumber=0;
 end;
 do _n_=0 by 1  while(week_end<end_date);
  week_end=intnx('week',start_date,_n_,'e');
/*  Active=start_date<=week_start<=end_date and start_date<=week_end<=end_date;*/
  output;
  weeknumber+1;
  week_start=week_end+1;
 end;
 format week_end week_start  date9.;
run;
A_SAS_Man
Pyrite | Level 9

Hey @novinosrin@ballardw pointed out some things I needed to clarify/correct in my problem statement. I am going to update my post to more accurately reflect what I'm trying to do. Sorry for the confusion here.

novinosrin
Tourmaline | Level 20

Hi @A_SAS_Man  Did you check out the solution I posted? I believe I am somewhere close. Please uncomment the active= statement in that and code and let me know if and where the discrepancy in a small test plz

A_SAS_Man
Pyrite | Level 9

I believe you are close, but did you see my updated criteria? Specifically this:

 

2. Check whether for that particular week and person, at least one day of the week falls between their start and end dates. If it does, put a "1" in the active field.

 

Whereas before I think your code was marking active only if all days fell in the start_date end_date range, I want it to mark active if any of the days fall into it. Once again I apologize for the confusion.

novinosrin
Tourmaline | Level 20

HI @A_SAS_Man  First off, I should be the one to apologize as I noticed a couple of  gross mistakes in my earlier post upon my current review.

Okay, here I made a slight change, 

For-->Check whether for that particular week and person, at least one day of the week falls between their start and end dates. If it does, put a "1" in the active field.

1. Condition changes to OR rather than AND 

  Active=start_date<=week_start<=end_date or start_date<=week_end<=end_date;

2.  I am basically do a weekly increment using the first value of start_date copying it to _s or in other words the first value of start_date becomes starting point for week_start computation.

3. Once we create two new vars i.e Week_start and Week_end date, we apply the condition mentioned in point 1 to compute ACTIVE.

 

I hope I have comprehended the question correctly this time or if not little more closer than previous



data have;
 input person $1. coverage_date:mmddyy10. start_date:mmddyy10. end_date:mmddyy10.;
 format coverage_date date9. start_date date9. end_date date9.;
 datalines;
1 01/01/2019 01/01/2019 01/19/2019
1 02/01/2019 02/02/2019 02/28/2019
1 03/01/2019 01/01/2019 02/28/2019
 ;
 run;


data want;
 set have;
 by person;
 retain week_start week_end _s ;
 if first.person then do;
  call missing(week_start,week_end);
  week_start=start_date;
  weeknumber=0;
  _s=start_date;
 end;
 do while(week_end<end_date);
  week_end=intnx('week',_s,weeknumber,'e');
  Active=start_date<=week_start<=end_date or start_date<=week_end<=end_date;
  output;
  weeknumber+1;
  week_start=week_end+1;
 end;
 format week_end week_start  date9.;
 drop _s;
run;

RESULT for this one:

person coverage_date start_date end_date week_start week_end weeknumber Active
1 01JAN2019 01JAN2019 19JAN2019 01JAN2019 05JAN2019 0 1
1 01JAN2019 01JAN2019 19JAN2019 06JAN2019 12JAN2019 1 1
1 01JAN2019 01JAN2019 19JAN2019 13JAN2019 19JAN2019 2 1
1 01FEB2019 02FEB2019 28FEB2019 20JAN2019 26JAN2019 3 0
1 01FEB2019 02FEB2019 28FEB2019 27JAN2019 02FEB2019 4 1
1 01FEB2019 02FEB2019 28FEB2019 03FEB2019 09FEB2019 5 1
1 01FEB2019 02FEB2019 28FEB2019 10FEB2019 16FEB2019 6 1
1 01FEB2019 02FEB2019 28FEB2019 17FEB2019 23FEB2019 7 1
1 01FEB2019 02FEB2019 28FEB2019 24FEB2019 02MAR2019 8 1
A_SAS_Man
Pyrite | Level 9

No apology necessary, I appreciate you working through this with me. I think this is extremely close to what I'm wanting. There is only one thing I've noticed that is not coming out as I would expect. I would like the week numbers to be consistent in terms of the same week_start date should receive the same week number across persons. I believe this is an easy fix on the back end for me though, I just need to run this code after yours to produce consistent week numbers.

 

proc sql;
create table want2 as
	select *,
		   week(week_start) as Week_Number_Corrected
	from want
;
quit;

This may be easier than building a fix into your solution, if so that is fine with me. If you have an easy way to build something like that into yours I would take it though, as I will be running this for a fairly large data set and any time savings I can manage are appreciated. I'm going to do some more testing on my larger data set tonight and I will report back/accept an answer when I have verified what is happening. Thanks so much again!

 

Edit: This might actually be a larger problem than I first realized, but we are still close. Consider the following data set.

 

data have;
 input person $1. coverage_date:mmddyy10. start_date:mmddyy10. end_date:mmddyy10.;
 format coverage_date date9. start_date date9. end_date date9.;
 datalines;
1 01/01/2019 01/01/2019 01/19/2019
1 02/01/2019 02/02/2019 02/28/2019
1 03/01/2019 01/01/2019 02/28/2019
2 02/01/2019 02/01/2019 02/28/2019
 ;
 run;

This marks the first weeks that person 1 and 2 were active as week 0, even though they are different times of the year. The below doesn't work syntax wise, but maybe demonstrates what I'm trying to get at.

data want_not_working_code;
 set have;
 by person;
 if first.person then do;
  week_start=intnx('week',start_date,'b');
  weeknumber=week(week_start);
 end;
 do while(week_end<end_date);
  week_end=intnx('week',_s,weeknumber,'e');
  Active=start_date<=week_start<=end_date or start_date<=week_end<=end_date;
  output;
 end;
 format week_end week_start  date9.;
run;

Hope this is clear but am happy to answer more questions if needed.

 

novinosrin
Tourmaline | Level 20

Let me revisit the thread thoroghly this evening and reattempt

A_SAS_Man
Pyrite | Level 9
I may make a new question just on this date piece, as that seems to be the tricky part and I should have focused more on that.
novinosrin
Tourmaline | Level 20

I was about to work on some details of how I comprehend. Well I haven't looked at your new thread yet. Do you not want to pursue this?

 

PS Sorry I slept off last night

A_SAS_Man
Pyrite | Level 9

No by all means if you solve this it makes the other thread redundant, I just thought maybe I could simplify the problem and make it easier to solve, in my conversations with you and ballard some things came up that I hadn't thought about so I was trying to make those clearer in the new thread. The other thread is here, maybe that can provide you some better clearer context.

https://communities.sas.com/t5/SAS-Programming/Break-a-Date-Range-Into-Multiple-Lines/m-p/635324

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 992 views
  • 1 like
  • 3 in conversation