As the developer of the SAS HOLIDAY function, the DATEKEYS procedure, the HPFEVENTS procedure, and the EVENT object of the ATSM package, "When is the Super Bowl?" is a question that I'm often asked. It's logical to assume that if SAS can calculate the date of Easter and Christmas, then surely the date of the most important event of the year could be calculated, correct?
The problem is that the dates for the Super Bowl can't be calculated because they are determined by a committee. The Super Bowl dates and locations are selected a few years in advance. However, the Super Bowl date has been changed in the past; in 2002 the Super Bowl date was changed "from the original date of January 27 to February 3" . This year, the NFL is already considering moving the date of the Super Bowl due to the COVID-19 pandemic. At SAS, if our software answers a question, we make our very best effort for the answer to be correct! It's easy to know the dates of Super Bowls that have already occurred, and analyzing historical data is how forecasting methods estimate the future effect. To forecast, it's important to know the future dates. In order to accurately estimate the past effects and forecast the future effects, the dates have to be maintained on a regular basis.
Sometimes even dates that can be calculated are changed. Mardi Gras is a large celebration in New Orleans that is based on the date of Easter. Easter can be calculated. However, in 2002 the dates for Mardi Gras were changed to accomodate Super Bowl XXXVI. This occurred again in 2013 for Super Bowl XLVII. The next conflict between Mardi Gras and Super Bowl is in 2024 for Super Bowl LVIII. The outcome for 2024 has yet to be determined. However, so far, the score is Super Bowl 2, Mardi Gras 0!
If the dates can't be calculated or are subject to change, the next best solution is to develop a system for maintaining the dates. It would also be useful to have a method for estimating future dates. This can be accomplished with SAS software.
In SAS Foundation, the HPFEVENTS procedure can generate a data set that contains an event repository. The event repository contains definitions to create dummy variables for events. Here is a simple example that uses the known and expected Super Bowl dates that are current as of June 30, 2021.
/* Super Bowl and dates Information obtained from: MOST RECENT URLS (since 2018): http://www.nfl.com/superbowl/history https://en.wikipedia.org/wiki/Super_Bowl https://en.wikipedia.org/wiki/Super_Bowl_XLVI https://en.wikipedia.org/wiki/Super_Bowl_XLVII https://en.wikipedia.org/wiki/Super_Bowl_XLVIII https://en.wikipedia.org/wiki/Super_Bowl_XLIX https://en.wikipedia.org/wiki/Super_Bowl_50 https://en.wikipedia.org/wiki/Super_Bowl_LI https://en.wikipedia.org/wiki/Super_Bowl_LII https://en.wikipedia.org/wiki/Super_Bowl_LIII https://en.wikipedia.org/wiki/Super_Bowl_LIV https://en.wikipedia.org/wiki/Super_Bowl_LV https://en.wikipedia.org/wiki/Super_Bowl_LVI https://en.wikipedia.org/wiki/Super_Bowl_LVII https://en.wikipedia.org/wiki/Super_Bowl_LVIII https://en.wikipedia.org/wiki/Draft:Super_Bowl_LIX PREVIOUS URLS: http://www.superbowl.com/history/mvps http://www.superbowl.com/features/futuresites (The asterisks represent the dates of Superbowls that have not yet been played - dates subject to change.) This information is supplied without any warranty of correctness. Number Date LIX Feb. 2, 2025* LVIII Feb. 4, 2024* LVII Feb. 5, 2023* LVI Feb. 6, 2022* LV Feb. 7, 2021* LIV Feb. 2, 2020 LIII Feb. 3, 2019 LII Feb. 4, 2018 LI Feb. 5, 2017 50 Feb. 7, 2016 XLIX Feb, 1, 2015 XLVIII Feb. 2, 2014 XLVII Feb. 3, 2013 XLVI Feb. 5, 2012 XLV Feb. 6, 2011 XLIV Feb. 7, 2010 XLIII Feb. 1, 2009 XLII Feb. 3, 2008 XLI Feb. 4, 2007 XL Feb. 5, 2006 XXXIX Feb. 6, 2005 XXXVIII Feb. 1, 2004 XXXVII Jan. 26, 2003 XXXVI Feb. 3, 2002 XXXV Jan. 28, 2001 XXXIV Jan. 30, 2000 XXXIII Jan. 31, 1999 XXXII Jan. 25, 1998 XXXI Jan. 26, 1997 XXX Jan. 28, 1996 XXIX Jan. 29, 1995 XXVIII Jan. 30, 1994 XXVII Jan. 31, 1993 XXVI Jan. 26, 1992 XXV Jan. 27, 1991 XXIV Jan. 28, 1990 XXIII Jan. 22, 1989 XXII Jan. 31, 1988 XXI Jan. 25, 1987 XX Jan. 26, 1986 XIX Jan. 20, 1985 XVIII Jan. 22, 1984 XVII Jan. 30, 1983 XVI Jan. 24, 1982 XV Jan. 25, 1981 XIV Jan. 20, 1980 XIII Jan. 21, 1979 XII Jan. 15, 1978 XI Jan. 9, 1977 X Jan. 18, 1976 IX Jan. 12, 1975 VIII Jan. 13, 1974 VII Jan. 14, 1973 VI Jan. 16, 1972 V Jan. 17, 1971 IV Jan. 11, 1970 III Jan. 12, 1969 II Jan. 14, 1968 I Jan. 15, 1967 */ /* this creates a data set named superbowl that defines the superbowl event */ proc hpfevents; eventdef SuperBowl = '15JAN1967'D '14JAN1968'D '12JAN1969'D '11JAN1970'D '17JAN1971'D '16JAN1972'D '14JAN1973'D '13JAN1974'D '12JAN1975'D '18JAN1976'D '09JAN1977'D '15JAN1978'D '21JAN1979'D '20JAN1980'D '25JAN1981'D '24JAN1982'D '30JAN1983'D '22JAN1984'D '20JAN1985'D '26JAN1986'D '25JAN1987'D '31JAN1988'D '22JAN1989'D '28JAN1990'D '27JAN1991'D '26JAN1992'D '31JAN1993'D '30JAN1994'D '29JAN1995'D '28JAN1996'D '26JAN1997'D '25JAN1998'D '31JAN1999'D '30JAN2000'D '28JAN2001'D '03FEB2002'D '26JAN2003'D '01FEB2004'D '06FEB2005'D '05FEB2006'D '04FEB2007'D '03FEB2008'D '01FEB2009'D '07FEB2010'D '06FEB2011'D '05FEB2012'D '03FEB2013'D '02FEB2014'D '01FEB2015'D '07FEB2016'D '05FEB2017'D '04FEB2018'D '03FEB2019'D '02FEB2020'D '07FEB2021'D '06FEB2022'D '05FEB2023'D '04FEB2024'D '02FEB2025'D / PULSE=DAY ; eventdata out=superbowl condense; eventdata out=superbowl_full; run; title 'Results of HPFEVENTS - Condensed'; proc print data=superbowl; run;
Here are the first 15 observations of the condensed superbowl data set:
In SAS Viya, the EVENT object of the ATSM package also stores event definitions and creates dummy variables. If you have a data set output from the HPFEVENTS procedure containing event definitions, that data set can be imported as a table into an INEVENT object of the INOBJ= option of the TSMODEL procedure. However, the full, non-condensed, version of the data set must be used as input to TSMODEL. The following example imports the full data set from the previous example:
/* the inEVENT table can import from HPFEVENTS, but it must be a full data set */ data sascas1.SuperBowl; set superbowl_full; run; /* to verify dates, create a time id that covers the range */
do date = start to end;
obs = date - start + 1;
format date DATE.;
run; proc tsmodel data = sascas1.SBDateRange LOGCONTROL= (ERROR = KEEP WARNING = KEEP NOTE=KEEP) outlog = sascas1.OUTLOG_ind (replace = YES) inobj = ( inEVENT = sascas1.SuperBowl ) outobj = ( outEVDUM = sascas1.SuperDummy (replace = YES) outEVENT = sascas1.TSMSuperBowl (replace = YES)
) errorstop = YES ; id date interval=day ; var obs; require atsm; submit; /* the default version of the input table is HPFEVENTS */ declare object inEVENT(inevent); declare object dataFrame(tsdf); declare object eventDB(event); rc = eventDB.Initialize(); /* Replay the previous outEVENT table, now inEVENT, to transfer the event definitions to eventDB */ rc = eventDB.Replay(inEVENT);
/* collect the definitions */
declare object outEVENT(outevent);
rc = outEVENT.collect(eventDB);
rc = dataFrame.Initialize(); if rc < 0 then do; stop; end; /* add all currently defined events (SuperBowl) to the dataFrame */ rc = dataFrame.AddEvent(eventDB, '_all_'); /* collect event dummy variable SuperBowl */ declare object outEvDum(outEventDummy); rc = outEvDum.collect(dataFrame); endsubmit; run; quit;
/* show that the dummy variable is non-zero on Super Bowl dates */
title "Super Bowl Dummy Variable Non-Zero Observations";
proc print data=sascas1.SuperDummy(where=(X NE 0));
Examining the first 15 observations of the non-zero values of the SuperDummy data set, we see that the non-zero observations coincide with the dates of the Super Bowl:
Now, an astute observer might take note that since 2004, the Super Bowl dates have all been on the first Sunday in February. (As noted previously, that could change this year.) The announced Super Bowl dates only extend to 2025. What if you are doing a 10 year forecast? The first Sunday in February can be calculated by SAS using the same methodology as the NWKDOM function. It is possible to calculate the first Sunday in February from 2026 forward and store that as an event definition. Then the known event definition for the Super Bowl can be combined with the estimated dates. This would allow you to forecast into the future using your best estimated dates.
In the next example, N1W1FEBYR is the SAS predefined event for the 1st (N1) Sunday (W1) in February (FEB) every year (YR). LS01FEB2026D is a level shift beginning in February 1, 2026. Combination events are defined using the EventComb method of the EVENT object. When the level shift event and the N1W1FEBYR events are combined using element-wsie multiplication, the zeros before the level shift date eliminate the N1W1FEBYR dates before the level shift date, and the ones after the level shift date select the N1W1FEBYR dates from 2026 into the future. Thus, the level shift acts as a filter, and only the future dates are used in the definition. When the estimated dates for the Super Bowl, SuperExt, are combined using element-wise addition with the known Super Bowl dates, SuperBowl, this results in a union of the dates. So, the event definition before February of 2026 is given using the known data, and the event definition in February 2026 and beyond is given using the approximation.
/* to verify dates, create a time id that covers the extended range */ data SBDateRangeExt; start=MDY(1,1,1967); end=MDY(12,31,2030); do date = start to end; obs = date - start + 1; output; end; format date DATE.; run; data sascas1.SBDateRangeExt; set SBDateRangeExt; run; proc tsmodel data = sascas1.SBDateRangeExt LOGCONTROL= (ERROR = KEEP WARNING = KEEP NOTE=KEEP) outlog = sascas1.OUTLOG_ind (replace = YES) inobj = ( inEVENT = sascas1.TSMSuperBowl ) outobj = ( outEVDUM = sascas1.SuperDummy (replace = YES) outEVENT = sascas1.TSMSuperBowlExt (replace = YES) ) errorstop = YES ; id date interval=day ; var obs; require atsm; submit; /* the input table was produced by TSMODEL */ declare object inEVENT(inevent('VERSION','TSMODEL')); declare object dataFrame(tsdf); declare object eventDB(event); rc = eventDB.Initialize(); /* Replay the previous outEVENT table, now inEVENT, to transfer the event definitions to eventDB */ rc = eventDB.Replay(inEVENT); rc = eventDB.eventcomb("SuperExt", "event","N1W1FEBYR", "event","LS01FEB2026D", 'RULE','MULT'); rc = eventDB.eventcomb('SuperBowlExt', 'event','SuperBowl', 'event','SuperExt'); /* collect the definitions */ declare object outEVENT(outevent); rc = outEVENT.collect(eventDB); rc = dataFrame.Initialize(); if rc < 0 then do; stop; end; /* add all currently defined events (SuperBowl) to the dataFrame. Add 'class','all' to include combination events */ rc = dataFrame.AddEvent(eventDB, '_all_','class','all'); /* collect event dummy variable SuperBowl */ declare object outEvDum(outEventDummy); rc = outEvDum.collect(dataFrame); endsubmit; run; quit;
Notice in the above code that:
The OutEventDummy object stores output in a linear format because the events can differ by BY group and so are unknown at the beginning of processing. Extra work is required to format the output into block format. The code below shows how to transpose the output from a linear format into a block format. The resulting data set stores each dummy variable in a separate variable.
proc sort data=sascas1.SuperDummy out=SDSort; by date; run; /* transpose the output to the usual block format for time series variables */ proc transpose data=SDSort out=SuperDummyBlock(drop= _label_ _NAME_); var X; id _XVAR_; by date; run; /* this is the usual (block) form of the event dummy variable output */ title "Extended Super Bowl Dates"; proc print data=SuperDummyBlock(where=((SuperBowlExt NE 0) AND (date GE '01JAN2026'D))); run;
Examining the non-zero values of the extended dates, notice that:
When dates of events are subject to change, some maintenance will always be required to maintain accuracy in your forecasts. However, the ability to store information and change it easily can result in forecasts that are adaptive to change.
SAS is headed back to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team.
Interested in speaking? Content from our attendees is one of the reasons that makes SAS Innovate such a special event!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.