BookmarkSubscribeRSS Feed

BY Group Processing and EVENTs in SAS Viya

Started ‎05-01-2020 by
Modified ‎05-04-2020 by
Views 4,412

Events in time series are sometimes referred to as interventions. An event occurs at a particular time or times and is used to model any incident that disrupts the normal flow of the process that generated the time series. Examples of commonly used event definitions include natural disasters, retail promotions, strikes, advertising campaigns, policy changes, and data recording errors. 

 

In processing each time series, the individual time series to be modeled are specified by BY group variable values and series variable. The events in a model can vary by individual time series. In SAS Foundation, users can specify the EVENTBY= data set of the HPFDIAGNOSE procedure to apply different events to individual time series. Now, in SAS Viya, the functionality of the EVENTBY= data set is provided by the INEVENTBY object and the ReplayEventby method of the TSDF Object.

 

However in SAS Foundation, each event name specified a fixed definition. If the event definition varied by individual time series, then multiple event names and definitions were required. A recent example of this is COVID -19 data, where the first confirmed positive case occurs on different dates for different countries. In SAS Foundation, if each country was a BY group, then different events would need to be created for each country, for example, FirstCaseUS, FirstCaseChina, and so on. Also, the EVENTBY data set would be specified to include the appropriate event to the BY group, for example FirstCaseUS to the US BY group. However, new features in SAS Viya support using a single event name where the event definition differs by BY group. In addition, it is possible to use the INSCALAR= table and programming statements to vary the event definitions by BY group. It is even possible to use just-in-time definitions since the event can be defined just before the time series is processed.

 

The event repository table in SAS Viya can now include BY group processing. The event repository table containing event definitions is input as an INEVENT object of the INOBJ= option of the TSMODEL procedure. All tables input using the INOBJ= option of TSMODEL are input similar to the AUXDATA= data set of the SAS Foundation Forecasting procedures. That is, if the input table contains BY group variables, then only the observations matching the current BY group are input for that BY group's processing. However, if the input table does not contain BY group variables, then all the observations are input and apply to every BY group. In addition, event repositories that are output using the OUTEVENT object of the OUTOBJ= option contain BY group variables if BY groups are specified using the BY statement. In the complete example below, look for the following options and programming statements related to INEVENT object and OUTEVENT object processing:

 

...
             outobj     = (
                              outEVENT     = mylib.outevent (replace = YES)
                              outEVDUM     = mylib.evdum    (replace = YES)
                          ) 
...
         declare object outEVENT(outevent);  
...
         rc = outEVENT.collect(eventDB); 
...
             inobj     = (
                           inEVENT     = mylib.outevent 
                          ) 
...
         declare object inEVENT(inevent('VERSION','TSMODEL'));  
...
         rc = eventDB.replay(inEVENT);  
...

 

SAS programming statements in the TSMODEL procedure allow the user to define and apply events conditionally. For example, if COUNTRY is a BY variable, then the following statements create an event definition for Boxing for BY groups where COUNTRY equals "UK":

 

        if  Country EQ "UK" then do;
             rc = eventDB.EventKey("Boxing");
        end;

 

Conditional logic can be used to define events differently for specified BY groups. For example, if StoreType is a BY variable, then the following statement alters the definition of Christmas (to Christmas Eve) for BY groups where StoreType equals "Super":

 

         if  StoreType EQ "Super" then do;
             rc = eventDB.EventKey("Christmas","SHIFT",-1);
         end;

 

The variables input using the INSCALAR= option of the TSMODEL procedure can be used in the conditional logic of SAS programming statements. Or they be used as variable arguments for numerical arguments of the EventDef and EventKey methods of the EVENT object.  The following statement defines the SchoolClosing event using the variable SchoolClosure such that the timing values of the event vary by different BY groups:

 

         rc = eventDB.EventDef("SchoolClosing",
                               "startdate",SchoolClosure,
                               "enddate",SchoolClosure+30,
                               'pulse','day');

 

The complete example illustrates how these new features and methods can be used to create event definitions that vary by BY group:

 

/* Create a test data set with 
   BY group variables Country and StoreNum 
   ID variable StoreType
   Time ID DATE
   Time Series sales
*/
data sales(keep=Country StoreNum StoreType DATE sales);
     length Country StoreType $32;
     format DATE DATE.;
     Country = "UK";
     set sashelp.air(obs=31);
     DATE = INTNX('DAY','01DEC2018'D,_n_-1);
     sales = air;
     do StoreNum = 1 to 3;
        StoreType = "Express";
        if ( MOD(StoreNum,3) EQ 0 ) then StoreType = "Super";
        output;
     end;
     Country = "US";
     do StoreNum = StoreNum to 10;
        StoreType = "Express";
        if ( MOD(StoreNum,3) EQ 0 ) then StoreType = "Super";
        output;
     end;
run;

/* Sort by BY variables and DATE */
proc sort data=sales out=sales;
by Country StoreNum DATE;
run;
proc print; run;

/* Create a data set with the same BY variables as sales and
   variable schoolclosure.
*/
data schooldata(keep=Country StoreNum StoreType schoolclosure);
     length Country StoreType $32;
     format schoolclosure DATE.;
     Country = "UK";
     schoolclosure = '10DEC2018'D;
     do StoreNum = 1 to 3;
        StoreType = "Express";
        if ( MOD(StoreNum,3) EQ 0 ) then StoreType = "Super";
        schoolclosure = schoolclosure + StoreNum;
        output;
     end;
     Country = "US";
     schoolclosure = schoolclosure + 1;
     do StoreNum = StoreNum to 10;
        StoreType = "Express";
        if ( MOD(StoreNum,3) EQ 0 ) then StoreType = "Super";
        schoolclosure = schoolclosure + StoreNum;
        output;
     end;
run;

data mylib.schooldata;
     set schooldata;
run;

data mylib.sales;
     set sales;
run;

proc tsmodel data      = mylib.sales  
             inscalar=mylib.schooldata 
             LOGCONTROL= (ERROR = KEEP WARNING = KEEP NOTE=KEEP)
             outlog    = mylib.OUTLOG_ind (replace = YES)
             outobj     = (
                              outEVENT     = mylib.outevent (replace = YES)
                              outEVDUM     = mylib.evdum    (replace = YES)
                          ) 
                  errorstop = YES
               ;

     /* The BY groups are uniquely defined by Country and StoreNum.
        StoreType functions as an ID variable - it has a single value for
        each (Country,StoreNum) combination. However, by specifying
        StoreType as a BY variable, it is available for processing
        in conditional logic in the programming statements. */
    
     by Country StoreNum StoreType;
     id date interval=day trimid=left;
     var sales/ accumulate=total;
     inscalar schoolclosure;

     require atsm;

     submit;

         declare object outEVENT(outevent);  

         declare object dataFrame(tsdf);

         declare object eventDB(event);
         rc = eventDB.Initialize();

         /* the SchoolClosure variable from the inscalar= table
            is used to vary the timing values for the SchoolClosing
            event definition by BY group */
         rc = eventDB.EventDef("SchoolClosing",
                               "startdate",SchoolClosure,
                               "enddate",SchoolClosure+30,
                               'pulse','day');

         /* add the predefined event definition for Boxing
            to BY groups where Country="UK" */
         if  Country EQ "UK" then do;
             rc = eventDB.EventKey("Boxing");
         end;
         /* add the predefined event definiton for Christmas
             to all BY groups */
         rc = eventDB.EventKey("Christmas");

         /* alter the event definition for Christmas 
            (to Christmas Eve) for StoreType="Super" */
         if  StoreType EQ "Super" then do;
             rc = eventDB.EventKey("Christmas","SHIFT",-1);
         end;
         /* collect the event definitions; BY group processing applies */
         rc = outEVENT.collect(eventDB); 

         rc = dataFrame.Initialize(); if rc < 0 then do; stop; end;
         /* add all currently defined events for this BY group to
            the dataFrame */
         rc = dataFrame.AddEvent(eventDB, '_all_'); 
         
         /* collect event dummy variables for each BY group */
         declare object outEvDum(outEventDummy);
         rc = outEvDum.collect(dataFrame);

     endsubmit;
run;
quit;

/* This events repository table supports 
   BY group differentiation of event definitions */
title "TSMODEL - definitions";
proc print data=mylib.outevent; 
format _STARTDATE_ _ENDDATE_ DATE.;
run;
       
proc sort data=mylib.evdum out=EventsInTSDF; 
by Country StoreNum StoreType date; 
run; 
proc transpose data=EventsInTSDF 
out=evdumtsm(drop= _label_ _NAME_); 
var X; 
id _XVAR_; 
by Country StoreNum StoreType date; 
run; 

/* this is the linear form of the dummy variable output
proc print data=EventsInTSDF; run;
*/
/* this is the usual (block) form of the event dummy variable output */
proc print data=evdumtsm; run;

proc tsmodel data      = mylib.sales  
             LOGCONTROL= (ERROR = KEEP WARNING = KEEP NOTE=KEEP)
             outlog    = mylib.OUTLOG_ind (replace = YES)
             inobj     = (
                           inEVENT     = mylib.outevent 
                          ) 
             outobj     = (
                              outEVENT     = mylib.outevent2 (replace = YES)
                              outEVDUM     = mylib.evdum2    (replace = YES)
                          ) 
                  errorstop = YES
               ;
     by Country StoreNum;
     id date interval=day trimid=left;
     var sales/ accumulate=total;

     require atsm;

     submit;

         /* VERSION=TSMODEL because the table was created in TSMODEL */
         declare object inEVENT(inevent('VERSION','TSMODEL'));  

         declare object dataFrame(tsdf);

         declare object eventDB(event);
         rc = eventDB.Initialize();

         /* add event definitions from the inEVENT table;
            BY group processing applies */
         rc = eventDB.replay(inEVENT);  

         rc = dataFrame.Initialize(); if rc < 0 then do; stop; end;
         /* add all currently defined events for this BY group to
            the dataFrame */
         rc = dataFrame.AddEvent(eventDB, '_all_'); 
         

         declare object outEvDum(outEventDummy);
         rc = outEvDum.collect(dataFrame);

         declare object outEVENT(outevent);  
         rc = outEVENT.collect(eventDB); 

     endsubmit;
run;
quit;

 

It is useful to examine the events repository. The event definition repository has many variables. For a simple example such as this one, many of the variables are set to the default values. To focus on the variables of interest, the following statements will display a subset of the event repository:

 

proc print data=mylib.outevent; 
     var Country StoreNum StoreType _NAME_ _CLASS_ _KEYNAME_ _STARTDATE_ _ENDDATE_
                 _DATEINTRVL_ _TYPE_ _VALUE_ _PULSE_ _SHIFT_ ;
     format _STARTDATE_ _ENDDATE_ DATE.;
run; 

Examine the results:

evflex2_repository.PNG

Note that the event repository contains the BY variables. The _NAME_ variable contains the name of the event. The Boxing event is only defined for BY groups where Country equals "UK". The _STARTDATE_ and _ENDDATE_ variables for the SchoolClosing event are different for each BY group. Examine the _SHIFT_ variable; the Christmas event is shifted one day previously only for BY groups where StoreType equals "Super". Note that the 2nd call to the TSMODEL procedure accepts this event repository as input.

 

The following code in the example transposes the event dummy variable output into a format familiar to users for time series:

 

proc sort data=mylib.evdum out=EventsInTSDF; 
     by Country StoreNum StoreType date; 
run;
 proc transpose data=EventsInTSDF 
     out=evdumtsm(drop= _label_ _NAME_); 
     var X; 
     id _XVAR_;
     by Country StoreNum StoreType date; 
run; 

Display the resulting table:

 

/* this is the usual (block) form of the event dummy variable output */
proc print data=evdumtsm; run;

It is useful to examine the event dummy variable output. Notice that observations 1 to 114 contain BY groups where Country equals "UK". For those BY groups, the Boxing variable contains 0 and 1 values. Observations 115 to 380 contain BY groups where Country equals "US". Since Boxing is not defined for those BY groups, the values for Boxing are missing values:

 

Boxing.PNG

 

Note that for BY groups where StoreType equals "Super", the Christmas event is shifted to Christmas Eve:

 

XmasShift.PNG

 

Examine the dates for the SchoolClosing event. For the BY group where Country equals "UK" and StoreNum equals 1, the event starts on 11DEC2018. For the BY group where Country equal "UK" and StoreNum equals 3 the event starts on 16DEC2018.

 

 

 

SchoolClose1.PNG

 

SchoolClose2.PNG

 

 Display the contents of the schooldata data set:

 

title "School Closing Data";
proc print data=schooldata;
run;

 

Notice the correspondence of the dates in the schooldata data set to the dummy variable values for SchoolClosing for the various BY groups:

 

SchoolCloseData.PNG

 

For further information on the new features, please refer to:

SAS® Visual Forecasting
Time Series Packages
Automatic Time Series Modeling Package

 

 

 

 

 

 

Version history
Last update:
‎05-04-2020 01:07 PM
Updated by:

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags