Help using Base SAS procedures

data manipulation question

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

data manipulation question

Hello

I'm trying to create a week variable that assigns the week number based on when Jan 1 falls for that year.

If there are more December days than January days in the week that contains January 1st, it becomes week 53.  If there are more January days, it becomes week 1. For example:

Sunday

Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

Week

Dec 28

Dec 29

Dec 30

Dec 31

Jan 1

Jan 2

Jan 3

Week 53

Dec 29

Dec 30

Dec 31

Jan 1

Jan 2

Jan 3

Jan 4

Week 1

I've been manually coding this, but now that I have to do this for a longer time frame, I'm hoping there is an easier/faster way to do this.

if case_reported_date >= '21dec2014'd and case_reported_date <= '27dec2014'd then week = '52';

if case_reported_date >= '28dec2014'd and case_reported_date <= '03jan2015'd then week = '53';

if case_reported_date >= '04jan2015'd and case_reported_date <= '10jan2015'd then week = '01';

if case_reported_date >= '11jan2015'd and case_reported_date <= '17jan2015'd then week = '02';


Accepted Solutions
Solution
‎11-14-2014 09:38 PM
Respected Advisor
Posts: 3,902

Re: data manipulation question

If I interpret the source data right then a week spans from Sunday to Saturday. The week count per year is based on the middle of the week - Wednesday. So week 1 is the one with the first Wednesday of the year.

If I've got that right then below code should do what you're after.

data test;

  infile datalines4 truncover dlm=';';

  attrib

    week_expected informat=best32. start_dt end_dt informat=anydtdte. format=date9.

    date date_shift date_fwy format=date9. week_calculated length=8

    ;

  input week_expected start_dt end_dt;

  do date=start_dt to end_dt;

    weekday=put(date,downame.);

    /* Date of Wednesday within a Sunday to Saturday week span */

    date_shift=intnx('week.4',date+3,0,'b');

    /* Date of first Wednesday of the year */

    date_fwy=intnx('year',date_shift,0,'b');

    date_fwy=intnx('week.5',date_fwy,0,'e');

    /* Week number for week starting on Sunday  */

    /* First week of year based on date of Wednesday */

    week_calculated=intck('week',date_fwy,date_shift)+1;

    /* compare expected with calculated week count */

    diff=week_expected-week_calculated;

    output;

  end;

datalines4;

51;December 18, 2011;December 24, 2011

52;December 25, 2011;December 31, 2011

1;January 1, 2012;January 7, 2012

2;January 8, 2012;January 14, 2012

51;December 16, 2012;December 22, 2012

52;December 23, 2012;December 29, 2012

1;December 30, 2012;January 5, 2013

2;January 6, 2013;January 12, 2013

51;December 15, 2013;December 21, 2013

52;December 22, 2013;December 28, 2013

1;December 29, 2013;January 4, 2014

2;January 5, 2014;January 11, 2014

52;December 21, 2014;December 27, 2014

53;December 28, 2014;January 3, 2015

1;January 4, 2015;January 10, 2015

2;January 11, 2015;January 17, 2015

;;;;

run;

Message was edited by: Patrick Matter

Based on comment fixed the narrative to correctly describe what the posted code does (week starting on Sunday and not on Saturday).

View solution in original post


All Replies
Super User
Posts: 17,959

Re: data manipulation question

How are you calculating week in the first place other than in this scenario?

Occasional Contributor
Posts: 5

Re: data manipulation question

I've tried using the WEEK function (e.g. cweek = week(date)),  but for this particular senario it doesn't work once you get to the end/start of the year.  Sometimes the assigned week will have less than 7 days.

Also, becuase I have multiple years of data in my dataset, the weeks will need to 'reset' once it hits 52 or 53 weeks.


PROC Star
Posts: 7,366

Re: data manipulation question

Brenda,

If you can use a Monday to Sunday criterion, rather than Sunday to Saturday, you could use the week function 'V' modifier. e.g.:

data want;

  do date='28DEC2015'd to '03jan2016'd;

    week=week(date,'V');

    output;

  end;

    do date='29DEC1997'd to '04jan1998'd;

    week=week(date,'V');

    output;

  end;

run;

Occasional Contributor
Posts: 5

Re: data manipulation question

This may do what you want, although I am not sure it follows exactly the same rule that you described.

/*

compute year/week according to this rule:

If Jan 1 falls on a Sunday through Wednesday, then the week that includes Jan 1

is week 1. Otherwise the following week is week 1. Then weeks are numbered consecutively

up to the last week of the year, which may be a 52 or a 53.

*/

data date1(keep=cdcweek dth_date);

   length cdcweek $ 6;

   do dth_date = '01jan2009'd to '31dec2012'd;

      year = year(dth_date);

      yyyy = put(year,4.);

      pyyy = put(year-1,4.);

      first_sat = mdy(1,1,year);

      jan01     = mdy(1,1,year);

      prevjan01 = mdy(1,1,year-1);

      do while(weekday(first_sat) ne 7);

         first_sat = first_sat + 1;

      end;

      if dth_date = . then cdcweek = '';

      else do;

         if weekday(jan01) le 4 then do;

            if dth_date le first_sat then cdcweek = yyyy||"01";

            else cdcweek = put(((year*100) + int((dth_date-first_sat-1)/7) + 2),6.);

            end;

         else do;

            if dth_date le first_sat then do;

               if (weekday(prevjan01) = 4) or ((weekday(prevjan01) = 3) and

                  (jan01-prevjan01 = 366)) then  cdcweek = pyyy||"53";

               else cdcweek = pyyy||"52";

               end;

            else cdcweek = put(((year*100)+int((dth_date-first_sat-1)/7) + 1),6.);

            end;    /* weekday(jan01) le 4 */

         end;  /* date missing */

         if cdcweek ge 200901 then output;

      end; /* dth_date from 1/1/2009 to 12/31/2012 */

run;

Occasional Contributor
Posts: 5

Re: data manipulation question

This seems to be doing part of the trick, but it looks like week 1 is getting lost in some instances, and week 53 is being generated when it should end at week 52.

cdcweekdth_dateepiweekyear
20115225-Dec-11522011
20115226-Dec-11522011
20115227-Dec-11522011
20115228-Dec-11522011
20115229-Dec-11522011
20115230-Dec-11522011
20115231-Dec-11522011
2012028-Jan-1222012
2012029-Jan-1222012
20120210-Jan-1222012
20120211-Jan-1222012
20120212-Jan-1222012
20120213-Jan-1222012
20120214-Jan-1222012
20125223-Dec-12522012
20125224-Dec-12522012
20125225-Dec-12522012
20125226-Dec-12522012
20125227-Dec-12522012
20125228-Dec-12522012
20125229-Dec-12522012
20125330-Dec-12532012
20125331-Dec-12532012
2013026-Jan-1322013
2013027-Jan-1322013
2013028-Jan-1322013
2013029-Jan-1322013
20130210-Jan-1322013
20130211-Jan-1322013
20130212-Jan-1322013
Occasional Contributor
Posts: 5

Re: data manipulation question

This is what I get when I run code that I cut-and-pasted from my first post. The missing week 1 suggests that something went missing when you edited the code. I am not sure of the week 52/53 discrepancy, since the rule I used may be different from yours.

                          Obs    cdcweek      dth_date

...

                    1084201151 12/23/2011
                     1085201151 12/24/2011
                     1086201152 12/25/2011
                     1087201152 12/26/2011
                     1088201152 12/27/2011
                     1089201152 12/28/2011
                     1090201152 12/29/2011
                     1091201152 12/30/2011
                     1092201152 12/31/2011
                     1093201201 01/01/2012
                     1094201201 01/02/2012
                     1095201201 01/03/2012
                     1096201201 01/04/2012
                     1097201201 01/05/2012
                     1098201201 01/06/2012
                    1099201201 01/07/2012
                     1100201202 01/08/2012
                     1101201202 01/09/2012
                     1102201202 01/10/2012
                     1103201202 01/11/2012
                     1104201202 01/12/2012
                     1105201202 01/13/2012
                     1106201202

01/14/2012

...

PROC Star
Posts: 7,366

Re: data manipulation question

Me thinks that you have to define your rules more clearly. Since 7*52 only equals 364, there will always be 1 or 2 days (depending upon leap years) that still have to be accounted for. From your example, I presume that a week 53 is desired.

I think that the following comes close, but probably needs some tweeking:

data test;

  format date weekdate29.;

/*  test cases for Sunday January 1st*/

  do date='28DEC2005'd to '31dec2006'd;

    start="Sunday";

    output;

  end;

/*  test cases for Monday January 1st*/

  do date='1jan2007'd to '31dec2007'd;

    start="Monday";

    output;

  end;

/*  test cases for Tuesday January 1st*/

  do date='1jan2008'd to '31dec2008'd;

    start="Tuesday";

    output;

  end;

/*  test cases for Thursday January 1st*/

  do date='1jan2009'd to '31dec2009'd;

    start="Thursday";

    output;

  end;

/*  test cases for Friday January 1st*/

  do date='1jan2010'd to '31dec2010'd;

    start="Friday";

    output;

  end;

/*  test cases for Saturday January 1st*/

  do date='1jan2011'd to '31dec2011'd;

    start="Saturday";

    output;

  end;

/*  test cases for Sunday January 1st*/

  do date='1jan2012'd to '31dec2012'd;

    start="Sunday";

    output;

  end;

/*  test cases for Tuesday January 1st*/

  do date='1jan2013'd to '31dec2013'd;

    start="Tuesday";

    output;

  end;

/*  test cases for Wednesday January 1st*/

  do date='1jan2014'd to '31dec2014'd;

    start="Wednesday";

    output;

  end;

/*  test cases for Thursday January 1st*/

  do date='1jan2015'd to '31dec2015'd;

    start="Thursday";

    output;

  end;

run;

data want;

  set test;

  if mdy(12,28,year(date)) <= date <= mdy(12,31,year(date)) then do;

    if weekday(mdy(1,1,year(date)+1)) eq 1 then week=ceil((date-mdy(1,1,year(date))+1)/7);

    else if weekday(mdy(1,1,year(date)+1)) eq 2 and day(date) eq 31 then week=53;

    else if weekday(mdy(1,1,year(date)+1)) eq 3 and day(date) gt 30 then week=53;

    else if weekday(mdy(1,1,year(date)+1)) eq 4 and day(date) gt 29 then week=53;

    else week=ceil((date-mdy(1,1,year(date))+1)/7);

  end;

  else if mdy(1,1,year(date)) <= date <= mdy(1,3,year(date)) then do;

    if weekday(mdy(1,1,year(date))) le 4 then week=1;

    else week=53;

  end;

  else week=ceil((date-mdy(1,1,year(date))+1)/7);

run;

Occasional Contributor
Posts: 5

Re: data manipulation question

From Brenda's example, there does not have to be a week 53 in every year, since up to 3 days at the beginning of the year may be counted in week 53 of the previous year.

PROC Star
Posts: 7,366

Re: data manipulation question

Now I see where I'm confused. In your example, which I've repeated below, for the third and fourth lines of code I would have guessed that the rules would have been:

if case_reported_date >= '29dec2013'd and case_reported_date <= '04jan2014'd then week = 1;

if case_reported_date >= '05jan2014'd and case_reported_date <= '11jan2014'd then week = 2;

if case_reported_date >= '11jan2014'd and case_reported_date <= '17jan2014'd then week = 3;


Please let us know if your example was wrong, or if I really am simply confused.

Sunday

Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

Week

Dec 28

Dec 29

Dec 30

Dec 31

Jan 1

Jan 2

Jan 3

Week 53

Dec 29

Dec 30

Dec 31

Jan 1

Jan 2

Jan 3

Jan 4

Week 1

I've been manually coding this, but now that I have to do this for a longer time frame, I'm hoping there is an easier/faster way to do this.

if case_reported_date >= '21dec2014'd and case_reported_date <= '27dec2014'd then week = '52';

if case_reported_date >= '28dec2014'd and case_reported_date <= '03jan2015'd then week = '53';

if case_reported_date >= '04jan2015'd and case_reported_date <= '10jan2015'd then week = '01';

if case_reported_date >= '11jan2015'd and case_reported_date <= '17jan2015'd then week = '02';

Occasional Contributor
Posts: 5

Re: data manipulation question

Here is an example of how I'm trying to group my weeks.. I've provided some links that show how some historical years are calculated

http://www.phac-aspc.gc.ca/fluwatch/11-12/11-12cal-eng.php

http://www.phac-aspc.gc.ca/fluwatch/12-13/12-13cal-eng.php

http://www.phac-aspc.gc.ca/fluwatch/14-15/14-15cal-eng.php

35August 24, 2014August 30, 2014
36August 31, 2014September 6, 2014
37September 7, 2014September 13, 2014
38September 14, 2014September 20, 2014
39September 21, 2014September 27, 2014
40September 28, 2014October 4, 2014
41October 5, 2014October 11, 2014
42October 12, 2014October 18, 2014
43October 19, 2014October 25, 2014
44October 26, 2014November 1, 2014
45November 2, 2014November 8, 2014
46November 9, 2014November 15, 2014
47November 16, 2014November 22, 2014
48November 23, 2014November 29, 2014
49November 30, 2014December 6, 2014
50December 7, 2014December 13, 2014
51December 14, 2014December 20, 2014
52December 21, 2014December 27, 2014
53December 28, 2014January 3, 2015
1January 4, 2015January 10, 2015
2January 11, 2015January 17, 2015
3January 18, 2015January 24, 2015
4January 25, 2015January 31, 2015
5February 1, 2015February 7, 2015
6February 8, 2015February 14, 2015
7February 15, 2015February 21, 2015
8February 22, 2015February 28, 2015
9March 1, 2015March 7, 2015
10March 8, 2015March 14, 2015
11March 15, 2015March 21, 2015
12March 22, 2015March 28, 2015
13March 29, 2015April 4, 2015
14April 5, 2015April 11, 2015
15April 12, 2015April 18, 2015
16April 19, 2015April 25, 2015
17April 26, 2015May 2, 2015
18May 3, 2015May 9, 2015
19May 10, 2015May 16, 2015
20May 17, 2015May 23, 2015
21May 24, 2015May 30, 2015
22May 31, 2015June 6, 2015
23June 7, 2015June 13, 2015
24June 14, 2015June 20, 2015
25June 21, 2015June 27, 2015
26June 28, 2015July 4, 2015
27July 5, 2015July 11, 2015
28July 12, 2015July 18, 2015
29July 19, 2015July 25, 2015
30July 26, 2015August 1, 2015
31August 2, 2015August 8, 2015
32August 9, 2015August 15, 2015
33August 16, 2015August 22, 2015
34August 23, 2015August 29, 2015
PROC Star
Posts: 7,366

Re: data manipulation question

Brenda,

Too bad that they don't publish more examples, but I think the links you provided answered most of my questions anyway.

Try the following code and see if the results match the desired calendar: They appear to match the three published tables.

data test;

  format date weekdate29.;

  do date='01JAN2005'd to '31dec2015'd;

    output;

  end;

run;

data want (drop=year_: last_: first_Smiley Happy;

  format first_sunday weekdate29.;

  set test;

  first_sunday=mdy(1,1,year(date))-weekday(mdy(1,1,year(date)))+1;

  if year(first_sunday) lt year(date) and first_sunday lt mdy(12,29,year(date)-1) then

   first_sunday+7;

  year_start=weekday(mdy(1,1,year(date)));

  year_end=weekday(mdy(12,31,year(date)));

  last_year_end=weekday(mdy(12,31,year(date)-1));

  if      year_end le 3 and mdy(12,32-year_end,year(date))<=date<=mdy(12,31,year(date)) then week=1;

  else if year_start eq 4 and (mdy(12,31,year(date))-mdy(1,1,year(date))) eq 365

   and mdy(12,32-year_end,year(date))<=date<=mdy(12,31,year(date)) then week=53;

  else if 5<=year_end<=6 and mdy(12,32-year_end,year(date))<=date<=mdy(12,31,year(date)) then week=52;

  else if year_end eq 4 and mdy(12,32-year_end,year(date))<=date<=mdy(12,31,year(date)) then week=53;

  else if last_year_end eq 5 and (mdy(12,31,year(date)-1)-mdy(1,1,year(date)-1)) eq 365

   and mdy(1,1,year(date))<=date<=mdy(1,2,year(date)) then week=53;

  else if last_year_end le 3 and mdy(1,1,year(date))<=date<=mdy(1,4-last_year_end,year(date)) then week=1;

  else if 5<=last_year_end<=6 and mdy(1,1,year(date))<=date<=mdy(1,7-last_year_end,year(date)) then week=52;

  else if last_year_end eq 4 and mdy(1,1,year(date))<=date<=mdy(1,7-last_year_end,year(date)) then week=53;

  else week=intck('week',first_sunday,date)+1;

run;

Solution
‎11-14-2014 09:38 PM
Respected Advisor
Posts: 3,902

Re: data manipulation question

If I interpret the source data right then a week spans from Sunday to Saturday. The week count per year is based on the middle of the week - Wednesday. So week 1 is the one with the first Wednesday of the year.

If I've got that right then below code should do what you're after.

data test;

  infile datalines4 truncover dlm=';';

  attrib

    week_expected informat=best32. start_dt end_dt informat=anydtdte. format=date9.

    date date_shift date_fwy format=date9. week_calculated length=8

    ;

  input week_expected start_dt end_dt;

  do date=start_dt to end_dt;

    weekday=put(date,downame.);

    /* Date of Wednesday within a Sunday to Saturday week span */

    date_shift=intnx('week.4',date+3,0,'b');

    /* Date of first Wednesday of the year */

    date_fwy=intnx('year',date_shift,0,'b');

    date_fwy=intnx('week.5',date_fwy,0,'e');

    /* Week number for week starting on Sunday  */

    /* First week of year based on date of Wednesday */

    week_calculated=intck('week',date_fwy,date_shift)+1;

    /* compare expected with calculated week count */

    diff=week_expected-week_calculated;

    output;

  end;

datalines4;

51;December 18, 2011;December 24, 2011

52;December 25, 2011;December 31, 2011

1;January 1, 2012;January 7, 2012

2;January 8, 2012;January 14, 2012

51;December 16, 2012;December 22, 2012

52;December 23, 2012;December 29, 2012

1;December 30, 2012;January 5, 2013

2;January 6, 2013;January 12, 2013

51;December 15, 2013;December 21, 2013

52;December 22, 2013;December 28, 2013

1;December 29, 2013;January 4, 2014

2;January 5, 2014;January 11, 2014

52;December 21, 2014;December 27, 2014

53;December 28, 2014;January 3, 2015

1;January 4, 2015;January 10, 2015

2;January 11, 2015;January 17, 2015

;;;;

run;

Message was edited by: Patrick Matter

Based on comment fixed the narrative to correctly describe what the posted code does (week starting on Sunday and not on Saturday).

PROC Star
Posts: 7,366

Re: data manipulation question

: I compared your code with mine on a larger date range and realized that I had to revise the code I posted in order to correctly calculate weeks for some leap years which weren't present in my original test.

However, I like your suggested code a lot better! I was concerned by your description, but you are in fact calculating weeks based on Brenda's required Sunday to Saturday spans.

As such, I modified your code to work with my test data, but Brenda should definitely mark your post as being the correct answer.

data test;

  format date weekdate29.;

  do date='01JAN1900'd to '31dec2200'd;

    output;

  end;

run;

data want (drop=date_Smiley Happy;

  set test;

  date_shift=intnx('week.4',date+3,0,'b'); /*1st Wed in week*/

  date_fwy=intnx('week.5',intnx('year',date_shift,0,'b'),0,'e'); /*1st Wed in year*/

  week=intck('week',date_fwy,date_shift)+1; /*week number*/

run;

Super User
Posts: 9,691

Re: data manipulation question

My code is longer than Arthu.T's and Patrick's code.

data test;
  format date weekdate29.;
  do date='01JAN1900'd to '31dec2200'd;
    output;
  end;
run;
data test;
 set test;
 week=week(date);
 if 1 lt weekday(intnx('year',date,0,'b')) lt 5 then week=week+1;
 if weekday(date)=1 then n+1;
run;
data want ;
 do until(last.n);
  set test;
  by n ;
  if day(date)=1 and month(date)=1 then do;
   need_change=1; weekday=weekday(date); prev_week=_week;
  end;
  _week=week;
 end;
 do until(last.n);
  set test ;
  by n ;
  if need_change then week=ifn((1 lt weekday lt 5),1,prev_week);
  output;
 end;
 drop n  need_change _week      prev_week weekday;
run;


Xia Keshan

Message was edited by: xia keshan

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 21 replies
  • 1253 views
  • 3 likes
  • 7 in conversation