Temporal Reconciliation in Large-Scale Forecasting

2 Likes

This post explains what temporal reconciliation is, how it might make your statistical forecasts more useful and accurate, and provides an example of implementing it in a large-scale forecasting context. The implementation functionality will come from the TSMODEL procedure in SAS Visual Forecasting. TSMODEL’s ability to process large numbers of sequences simultaneously as distributed arrays has been described in previous posts, links at the bottom. Here, it provides a fast, efficient and elegant solution to the problem of implementing temporal reconciliation on large-scale data.

Forecast reconciliation is widely used in applied forecasting. It’s the process of making groups of forecasts add up so that there are no discrepancies. As an example of traditional reconciliation, consider a product demand forecast at a distribution level and the individual product demand forecasts that flow through it in a given time interval. If the sum of the product demand forecasts doesn’t equal distribution center forecast, then they can be reconciled by adjusting the forecasts at one of the two levels so that they match. Reconciliation provides a consistent objective for stocking, planning and other business activities up and down the data hierarchy.

A simple example for a three-level data hierarchy and one time interval is shown below. The middle or Warehouse level is chosen as the level to reconcile to. Statistical forecasts at the top and bottom (store) level will be adjusted to accomplish reconciliation.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

The diagram below shows the reconciled forecasts for this time interval. Note that the top-down adjustment to the two Stores on the left is done in a way that preserves the proportion between the two original statistical forecasts, here 2 to 1. Proportional allocation is the standard for top-down reconciliation.

The process of reconciliation occurs for forecasts at each time interval in the history and lead horizon, where feasible.

Reconciliation can be used to improve forecasts on problematic data. For example, sequences at lower levels of a data hierarchy, like a sku or store level, may be sparse and noisy. Rolling the data up to a higher, say warehouse or regional level, smooths the data and makes it less sparse. Signal components like trends and input variable effects may be more easily identified and quantified in models in the higher-level data. Reconciliation adjustments can improve sku or store level forecast accuracy by ‘pushing down’ signal from warehouse level forecasts.

Temporal reconciliation is the process of making forecasts indexed with two different time intervals match. It’s closely related to traditional reconciliation with the following difference; in hierarchically arranged data, forecasts are temporally reconciled at the same level. That is, they are forecasts of the same thing, for example warehouse level sales, but the forecasts are denominated differently in terms of their time intervals. Temporal reconciliation can be used to reconcile or benchmark daily interval warehouse level forecasts to be consistent with monthly interval warehouse forecasts.

The idea of signal component variation being ‘pushed down’ to improve lower-level forecasts is also relevant in the temporal context. It’s usually straight-forward to identify and model a weekly or 7-day cycle on daily interval data, when it exists. Any day of the week pattern is obliterated when the data is accumulated or rolled up to a monthly interval. However, an annual seasonal pattern is usually easier to detect and accommodate in models identified on monthly interval data relative to daily interval data. Benchmarking your daily interval forecasts to your monthly interval forecasts provides a way to capture both cycles in daily interval forecasts when they both exist.

Now that we’ve described what temporal reconciliation is and why you might want to implement it, we’ll proceed with the example. The example data is in a two-level hierarchy with REGION at the top and distribution center (DISTCTR) on the bottom. There are 7 regions and 25 distribution centers. The DISTCTR forecasts start around JAN2017 and run through AUG2024. This time range includes both historical and lead forecasts.

We have two forecast tables, BLOGINDAY and BLOGINWEEK. The daily and week interval forecasts are for sales that flow through the 25 distribution centers (REGION, DISTCTR pairs). The forecasts in both tables were generated using the automatic time series modeling (ATSM, SAS Visual Forecasting) package in the TSMODEL procedure, and the tables are loaded into memory. A subset of each table is shown below.

BLOGINDAY

BLOGINWEEK

In this example, the goal is to benchmark or reconcile the daily interval forecasts within each week to the corresponding week interval forecast. The first step in implementing temporal reconciliation is to create a variable that identifies each week in the daily interval forecast data. v In the syntax below, we create another BY or sub-setting variable that flags each week in the daily interval data.

WK uses the WEEK function to create a variable that contains the week number that each PREDICT (forecast) value falls in, 0 to 52.
YR uses the YEAR function to create a variable that contains the year that each forecast falls in.
WKDYR uses the concatenation (CATS) function to create a variable that is a combination of WK and YR. WKDYR provides a unique identifier for each week within each DISTCTR in the daily interval forecast data.
T is a counter that increments 1 for each day of a given week, 0 – 6. It will be used as the time ID in subsequent syntax.

data casuser.dayin;
    set casuser.bloginday;
    retain dpredict;
    wk = week(date);
    yr = year(date);
    wkdyr = cats(yr, wk);
    *t is counter, 0 to 6 that will be used as a time-id for the temporal reconciliation in the TSMODEL procedure;
    t = weekday(date)-1;
    *fill in missing predictions with actuals, this is optional;
    if predict ne . then do;
        dpredict = predict;
    end;
    else do;
        dpredict = actual;
    end;
    keep region date distctr dpredict actual wk wkdyr t ;
run;

A subset of the CASUSER.DAYIN table is shown below.

The syntax below adds the same WKDYR variable to the week interval forecast data.

data casuser.weekin;
    set casuser.bloginweek; 
    retain wpredict;
    wkd = week(date);
    yr = year(date);
    wkdyr = cats(yr, wkd);
    if predict ne . then do;
         wpredict = predict;
    end;
    else do;
        wpredict = actual;
    end;
     keep region distctr wkdyr wpredict;
run;

Since the CASUSER.WEEKIN data has a week interval, there is one unique value of WKDYR for each row in each distribution center.

Now, we’re ready to implement temporal reconciliation in the TSMODEL procedure. Since there is one unique week interval forecast for each sequence of day interval forecasts, we will use the CASUSER.WEEKIN table as an in-scalar data set.

In TSMODEL, a system scalar variable is a number that is associated with a sequence or array in the input data set. System scalars are widely useful beyond this application. Some further examples of using scalar variables in TSMODEL can be found in the posts linked at the end.

proc tsmodel data=casuser.dayin inscalar=casuser.weekin outarray=casuser.rec_distctrfcst outscalar=casuser.sum;
    by region distctr wkdyr ;
    id t interval=obs;
    var date dpredict actual wk;
    outarrays propdfor recfor;
    inscalar wpredict;
    outscalar wpredict sumdfcst;
    submit;
        * The weekly sum of statistical forecasts needs to be calculated in a separate loop, because the accumulation increments 
            in each step of the loop. We want the value on the last step.;
        do h = 1 to dim(dpredict);
            sumdfcst += dpredict[h];
        end;
        do i = 1 to dim(dpredict);
             * create a PROPDFOR array that represents the daily proportion of the weekly sum of statistical forecasts;
            propdfor[i] = dpredict[i]/sumdfcst;
            * RECFOR splits up the week level forecast according to the daily forecast proprtions.; 
            recfor[i] = wpredict*propdfor[i];
        end;
     endsubmit;
run;
quit;

In the syntax above, the DAYIN input table contains the sequences to be processed. WEEKIN is read in as an INSCALAR table. The OUTARRAY table will contain the arrays in the input table as well as any new arrays that are created in processing. The OUTSCALAR table will contain any system scalar variables listed on the OUTSCALAR statement.

The BY, ID and VAR statements combine to define the timeseries arrays that will be processed in the SUBMIT block.

The BY statement lists the hierarchical arrangement of the data; REGION, DISTCTR and WKDYR.
The ID statement lists the time ID variable for the analysis and the desired interval of the timeseries arrays. Here, we’re using the counter, T, as the time ID variable and listing OBS as the interval. OBS implies that the data is already equally spaced.
The VAR statement lists the sequences in the input data set to be processed in the SUBMIT block.The OUTARRAYS statement lists or declares the arrays that will be created in subsequent processing.
The OUTSCALARS statement lists the system scalars to be output to the OUTSCALAR table.

The SUBMIT block does the array processing. Proportional allocation of a week interval forecast to each day in that week has two basic steps. First, we need to calculate the proportion of each daily forecast using the sum of daily forecasts for a given week. Second, we’ll allocate the week interval forecast by multiplying it by the array of proportions created in the first step.

The first loop, indexed by h, creates a new scalar named SUMDFCST. This is the sum of the daily forecasts in each week. Note, Because the accumulating sum increments by one element of an array in each step of the loop, SUMDFCST is calculated in a separate loop so that it represents the sum of all 7 days in a given week.
The second loop, indexed by i, calculates the temporally reconciled forecasts.
- The new array, PROPDFOR contains the proportion of each daily forecast to that week’s forecast total.
- The new array, RECFOR is the product of WPREDICT, the week interval forecast, and the array of daily forecast proportions.

A subset of the CASUSER.REC_DISTCTRFCST table that contains the daily interval reconciled forecasts is shown below.

A subset of the OUTSCALAR table is shown below. Note that there is one value of WPREDICT and SUMDFCST associated with each BY group or REGION, DISTCTR and WKDYR combination.

Reconciliation has been promoted in this post as a way to improve the fit or accuracy of the forecasts being adjusted by reconciliation. For this to be true, the majority of the sparse and noisy lower level timeseries must possess common signal components. If some lower-level series are trending up, some are trending down and some are flat, then the process of accumulating or rolling the data up, forecasting and then reconciling to the lower level will likely make the fit of the lower-level forecasts worse. Spending time grouping series with like signal components and then performing accumulation, modeling and reconciliation on each identified group is time well spent in applied forecasting. Series at the lower level need to be fairly homogeneous for reconciliation to improve their associated forecast accuracy.

A couple more notes may be useful. First, temporal reconciliation, as implemented here, requires creating new BY groups for each interval (week) in the lower frequency data. For the data shown in the demonstration, this increased the number of arrays to process from 26 distribution centers to the number of weeks within the 26 distributions centers, 82,400. The number of rows in the data stayed the same, but the BY groups or arrays are what get distributed and processed in TSMODEL. This may increase the computational load and slow things down, depending on your resources.

Finally, the example presented here has not been extensively tested or refined, and there are problems with the solution that you would not find in production SAS software. One issue is that while days nest in weeks, weeks don’t nest in years. Boundary weeks (first or last week of the year) caused anomalous reconciled forecast values for those weeks. The syntax below provides a workaround by not performing temporal reconciliation on forecasts in boundary weeks. The temporal reconciliation approach outlined in this post should work well to benchmark forecasts denominated in time intervals that nest, e.g. hours to days, days to months, months to quarters and so on. Weekly interval forecasts will cause issues.

I hope that you found this post useful and wish you good luck in your forecasting efforts. My thanks to Lorne Rothman, Michele Trovero and Ari Zitin for their contributions to this post. Any miss-statements and errors are solely my own.

Temporal reconciliation syntax with a work-around for boundary week issues.

proc tsmodel data=casuser.dayin inscalar=casuser.weekin outarray=casuser.rec_zipford outscalar=casuser.sum;
    by cluster zip3 wkdyr ;
    id t interval=obs;
    var stoptheclockdate dpredict actual wk;
    outarrays propdfor recfor;
    inscalar wpredict;
    outscalar wpredict sumdfcst;
    submit;
         * The weekly sum of statistical forecasts needs to be calculated in a separate loop, because the accumulation increments 
                in each step of the loop, and we want the value on the last step.;
          do h = 1 to dim(dpredict);
              sumdfcst += dpredict[h];
          end;
        
          do i = 1 to dim(dpredict);
                if wk[i] not in (52, 53, 0, 1) then do;
                 * create a PROPDFOR array that represents the daily proportion of the weekly sum of statistical forecasts;
                    propdfor[i] = dpredict[i]/sumdfcst;
                  * RECFOR splits up the week level forecast according to the daily forecast proprtions.; 
                     recfor[i] = wpredict*propdfor[i];
                 end;
                if wk[i] in (52, 53, 0, 1) then do;
                    recfor[i] = dpredict[i];
                end;
           end;
   endsubmit;
run;
quit;

Links to some TSMODEL related posts

https://communities.sas.com/t5/SAS-Communities-Library/Data-Step-for-Timeseries-part-1-Overview/ta-p...

https://communities.sas.com/t5/SAS-Communities-Library/Data-Step-for-Timeseries-part-2-BY-Group-Proc...

https://communities.sas.com/t5/SAS-Communities-Library/Data-Step-for-Timeseries-Part-3-the-FCMP-Proc...

Find more articles from SAS Global Enablement and Learning here.