## 3 ways to consider movable holidays in SAS

Started ‎04-26-2019 by
Modified ‎04-26-2019 by
Views 3,817

Easter has gone by and if you celebrate it, I hope you had fun celebrating with your family. In my last blog I discussed that Easter is not always a treat for data scientist because it's a moving holiday. In this article, I would like to introduce you to:

• my friend George, who is happy if he doesn't have to see his mother-in-law.
• Joanna, who enjoys skiing in the Austrian alps.
• busy globetrotter friends of my parents.

All these stories relate to date interval length calculations that involve movable holidays. You will see how easy it is to perform complex calculations and data preparation with movable dates like Easter or point-in-time definitions like "the first full weekend in February."

# For my friend George, his mother-in-law is a nightmare

I met my friend George after Easter. He got married two years ago and has a very good marriage. After a few drinks however he admits that spending time with the family of his wife, especially his mother-in-law, is not the most ideal. Fortunately this only happens twice a year, at Easter and Christmas.

• At Easter they show up on Good Friday and leave on Easter Sunday.
• At Christmas the length of their stay is often a little bit longer, depending on the weekday of Dec 24th. Their family tradition is as follows: They arrive on the Saturday before Dec 24th and leave on Dec 26th.
• Consequently the shortest interval when he has his relative in his house is 4 days (from Saturday Dec. 23rd until Tuesday, Dec. 26th, when the 24th falls on a Sunday).
• If however the 24th falls on a Saturday, they will have arrived on Saturday, Dec 17th - creating the longest possible stay.

George says that his most enjoyable years are those with a long time span between Easter and Christmas and with a short visit from his parents-in-law at Christmas. However as both time intervals depend on movable dates, it's hard to calculate this for future years.

SAS won't be able to change the attitude of his mother-in-law. But we can easily calculate the length of the time interval when George doesn't have to see her.

The following SAS program calculates the "relaxing time interval" for George between Easter and Christmas. It also calculates the number of days his parents-in-law stay with them for Christmas.

``````data mother_in_law_free_interval;
format Year          8.
EasterSunday  date9.
SatBefore2412 weekdatx.;
do Year = 2019 to 2030;
SatBefore2412 = intnx('week.7',mdy(12,23,year),0);
EasterSunday  = holiday('EASTER',year);
RelaxingDays  = SatBefore2412-EasterSunday;
RelaxingWeeks = intck('WEEK',EasterSunday,SatBefore2412)+1;
DaysXmas      = mdy(12,26,year) - SatBefore2412 + 1;
output;
end;
run;``````

1. You use the INTNX function to find the the next Saturday before Dec 24th.
• Here you start at Dec 23rd and consider weeks where Saturday is the first day (--> WEEK.7)
• You move for 0 weeks (you want to stay in the same week) to the begin of the "Saturday-week."
• If Dec 23rd is already a Saturday, your date does not change, otherwise you move your date to the preceding Saturday.
2. You use the HOLIDAY function to find Easter Sunday.
3. You simply subtract the dates to retrieve the length of the time interval between Easter and the Saturday before Christmas.
4. You use the INTCK function to count the number of weeks between Easter Sunday and the Saturday before Christmas.
5. You calculate the number of days they stay for Christmas by comparing the Dec 26th and the Saturday before Dec 24th.

The following table shows that in 2019 there are only 35 weeks between Easter and Christmas. The Saturday before Christmas is Dec 21st which means that Georges family-in-law will stay for 6 days from Dec 21st to Dec 26th.

2022 looks like a hard year for George where the number of relaxing weeks is low and he has them in his house for the maximum of 10 days around Christmas. 2023 looks much better, George!

# Joanna, my colleague from university, only has time to ski between the end of the winter term and Easter

Joanna is a lecturer at a business school in northern Germany. She only teaches in the winter term which ends on the last Saturday in January. The first full weekend in February is therefore the earliest date when she can be home in Austria and start the skiing season. Many ski lifts in Austria operate until Monday after Easter. So depending on the effective Easter date, Joanna has a varying number of weekends available for skiing every year.

The following simple SAS program calculates the number of "skiing weekends for Joanna" (= number of weekends starting with the first full weekend in February until the Easter weekend).

``````data skiing_weekends;
format  Year               8.
FirstFullSatInFeb  weekdatx.
EasterMonday 	    date9.;
do Year = 2015 to 2025;
FirstFullSatInFeb = intnx('week.7',mdy(1,31,year),1);
EasterMonday      = holiday('EASTER',year)+1;
NumSkiWeekends    = intck('week.1',FirstFullSatInFeb,EasterMonday);
output;
end;
run;``````

1. As a first step you use the INTNX function to determine the first full weekend in February. Starting from January 31st, you move one week forward to the next Saturday. Note that a WEEK in SAS starts with Sunday and WEEK.7 shifts the start date of the week to Saturday.
2. You use the HOLIDAY function to retrieve the Easter date of the respective year and add one day to move to Easter Monday.
3. You use the INTCK function to count the number of weeks (starting with Sunday) that lie between the first Saturday in February and Easter Monday. This equals the number of weekends where Joanna can ski.

The following table shows the results for the years 2015-2025. You see that the number varies due to the moving Easter date and the fact that the first full weekend in February is moving as well.

# Friends of my parents are busy globetrotters - their only time window for a visit to Lake Neusiedl lies between Easter and Pentecost

Good old friends of my parents are traveling throughout the year across the world. They are real globetrotters: New Zealand, Peru, Canada, Sweden, India, Hawaii, Russia, China are just a selection of countries they visited over the last years. For them the 50 days between Easter and Pentecost are the only time in the year where they can free some time to visit my parents at Lake Neusiedl.

Of course they would like to have good weather when they come to Burgenland in Austria.

• For him, "good weather" can be easily defined, it is just a day with no or only little rain (rain <= 1 mm).
• For his wife, a good day is a day where the temperature climbs above 25 °C.

## Calculating basic probabilities

The Hydrological office of Burgenland maintains a great web portal with measures in 15-minute intervals of the water level, temperature and precipitation. I also use this data for my data science lectures at the University, where we build a small data warehouse and run some data science analyses. For my lecture the office provided me data on that granularity starting from 1999.

Selected rows of this data are shown below for the precipitation (N) and the temperature (T).

The following query joins the two tables together, and aggregates the measures on a daily level.

``````proc sql;
*** Daily Aggregation;
create table Weather_Illmitz
as
select datepart(t.datumzeit) as Date format=date9.,
mean(t.wert) as TempMean format=8.1,
max(t.wert)  as TempMax format=8.1,
sum(n.wert)  as RainSum format=8.1
from h.n_i_oldhist as n,
h.t_i_oldhist as t
where n.datumzeit = t.datumzeit
group by Date;
quit;``````

As the timestamp in this data is in the format DATETIME, you need to use the DATEPART function to extract the date.

The resulting table WEATHER_ILLMITZ with data on a daily basis looks as follows:

Based on this table you now want to calculate the desired statistics on a yearly basis for the interval between Easter and Pentecost.

``````proc sql;
*** Yearly Statistics;
create table Weather_Yearly
as select year(Date) as Year,
holiday('easter',year(Date)) as EasterSunday format=date9.,
mdy(month(calculated EasterSunday),day(calculated EasterSunday),2018) as Easter format=date9.,
sum(TempMax > 25) as DaysTemp25,
sum(RainSum >  1) as RainDays
from Weather_Illmitz
where holiday('easter',year(Date)) <= date <= holiday('easter',year(Date))+49
group by Year, EasterSunday;
quit;``````
• In the above query you use the HOLIDAY function in the WHERE clause to restrict the analysis only to days in the 50-day interval from Easter to Pentecost.
• You use the the SUM function with an expression to count the days where a certain condition is true. Here you use the two statistics that have been defined above, temperature > 25 °C and RainSum > 1 mm.
• Beside the date of Easter Sunday for every year, it's advisable to also add a date variable that holds the day and month of Easter for the respective year in a date value for a constant display year. This facilitates plotting different (early and late) Easter dates on one time axis.

The following table reports the number of days with a temperature > 25 °C and the number of RainDays per year between Easter and Pentecost. You see how easy it is to use the HOLIDAY function to select the appropriate records per year.

Based on this data, the following query calculates the overall probability for a
"Temperature > 25 °C" day with 17.8 % and for a "Rain" day with 20.8 %.
Based on that, the friends of my parents can get a clearer picture of the weather conditions

at that time of the year here in Burgenland.

## Analyzing the relationship between "early" and "late" Easter

The variable EASTER contains the Easter date for every year shifted to the year 2018. You can use this variable to plot the relationship of the number of "Temp > 25 °C" days with the occurrence of Easter in the year.

``````proc sgplot data=Weather_Yearly;
reg x=Easter y=DaysTemp25;
run;``````

You can use the REG statement in the SGPLOT procedure to not only receive a scatter plot but also a regression line.

You see the expected relationship that a year with later Easter date consequently has more "Temp > 25°C" days until Pentecost. Unfortunately, the same is true for the number of Rain Days. The probability for rain also increases with a later Easter date.

You can apply the REG procedure to a simple linear model that quantifies this relationship.

``````data Weather_Yearly;
set Weather_Yearly;
DaysPast21MAR = Easter - '21MAR2018'd;
run;

proc reg data=Weather_Yearly;
model DaysTemp25 = DaysPast21MAR;
run;``````

Note that it is advisable to center the dependent variable for the analysis for a numerically more stable estimate. The date variable EASTER contains values around 20,000. If you simply subtract the 21st of March you receive smaller, better computable and better interpretable values as "Days past the earliest possible Easter date."

• In the "Temp > 25 °C" model, the coefficient for variable DAYSPAST21MAR is 0.528. This shows that the average number of  "Temp > 25 °C days" increases by 5.3 if Easter takes place 10 days later.
• The coefficient for DAYSPAST21MAR in the "RainDays" model is 0.14, indicating that a Easter that takes place 10 days later brings on average 1.4 more rain days until Pentecost.

For my parent's friends, these results mean that he prefers a year with an earlier Easter date as this reduces the risk for a rainy day, while his wife prefers a year with a later Easter date as she can expect more days with a temperature of more the 25 °C. Lets see how they decide 🙂

In one of his DO-LOOP blog contributions, Rick Wicklin explains how you can easily add the regression equation to a regression line plot.

# Feature generation for data science

These examples can be easily generalized to business applications. When modeling the demand in the retail, hospitality or leisure industry, it's important to consider whether Easter falls into March or April or whether a particular month has 4 or 5 Saturdays or Sundays. These facts can heavily influence demand pattern, traffic, number of visits.

For predictive modeling and time series forecasting, it's therefore important to be able to calculate the length of simple and complex time intervals and to count the number of special days and events.

Date and Time functions in SAS like the INTNX and the INTCK function as well as the family of the HOLIDAY functions provide great support for these tasks.

More content on data preparation for data science can be found in my SAS Press books. Especially in "Data Preparation for Analytics Using SAS".

Great article using some rarely used SAS date functions.  I would add NWKDOM() as a useful function for relative calendar dates.  For example, President's Day holiday in the U.S. is the third Monday in February.  PresidentDay = NWKDOM(3, 2, 2, &year.).  Where: 3 = the third week of the month; 2 = the second day of the week (Monday); 2 = the second month in the year; &year. is a macro variable of the referenced year.

Very good comment! thanks for bringing this up, Jim. Yes, the NWKDOM function is also here to help you with dates. Applying this function is also useful in my examples 1 and 2.

Thanks for sharing this. I will try to use it shortly.

Version history
Last update:
‎04-26-2019 09:08 AM
Updated by:
Contributors
Article Labels
Article Tags