Fluorite | Level 6

## Calculate predictions and compare with actual values

Dear all,

I have sales data that is grouped by company and year. I would like to "forecast" sales data within the sample so that I could compare the predicted sales figures with the actual sales data. For example, the data spans from 1984 to 2015 for many companies and I would like to calculate the predicted sales with a exponential smoothing procedure in 1988 from actual sales of the four previous years (1984-1987) and continue this throughout all available years so I would have a sales value for 1988 (from 1984-1987), 1989 (from 1985-1988) and so forth. So far I have been able to forecast data for the "next" year (in this case 2016) but I do not know how to set the forecast window so that it would move in 4-year increments.

Thank you,

Pieter

1 ACCEPTED SOLUTION

Accepted Solutions
PROC Star

## Re: Calculate predictions and compare with actual values

You could use proc expand or (I think) proc timedata to make rolling 4-year windows, but I don't recall the syntax off hand.  Also, you need to have a license for the SAS/ETS (econometrics/time series) package.

Here's a data step that modifies your data, so that instead of "by gvkey" you can use "by gvkey window", where window=1 for the first 4 years, window=2 for the 2nd through 5th years, etc.   That is they are rolling windows.  Here's the code:

``````DATA firm;
INPUT fyear gvkey sale;
DATALINES;
1993 1013 366.1180
1994 1013 448.7350
1995 1013 586.2220
1996 1013 828.0090
1997 1013 1164.4500
1998 1013 1379.6780
1999 1013 1926.9470
2000 1013 3287.9000
2001 1013 2402.8000
2002 1013 1047.7000
2003 1013 773.2000
2004 1013 784.3000
2005 1013 1169.2000
2006 1013 1281.9000
2007 1013 1322.2000
2008 1013 1456.4000
2009 1013 996.7000
2010 1013 1156.6000
1993 1034 338.2300
1994 1034 469.2630
1995 1034 520.8820
1996 1034 486.1840
1997 1034 500.2880
1998 1034 604.5840
1999 1034 732.4430
2000 1034 900.7940
2001 1034 974.9900
2002 1034 1230.7620
2003 1034 1297.2850
2004 1034 1339.4800
2005 1034 553.6170
2006 1034 653.8280
2007 1034 722.4250
run;

data vneed / view=vneed;
/* Get gvkey of current rec and current+3 record*/
merge firm (keep=gvkey rename=(gvkey=gvkey1))
firm (firstobs=4 keep=gvkey rename=(gvkey=gvkey4));

/* If starting a new gvkey, reset window index */
if lag(gvkey1)^=gvkey1 then window=0;
window+1;
if gvkey1^=gvkey4 then window=.;

/* Read and write 4 records for valid windows*/
set firm;              if window^=. then output;
set firm (firstobs=2); if window^=. then output;
set firm (firstobs=3); if window^=. then output;
set firm (firstobs=4); if window^=. then output;
run;

proc forecast data=vneed out=new2 lead=1 method=expo weight=.75 nstart=3;
by gvkey window;
id fyear;
var sale;
run;``````

Notes:

1. The merge statement merges observation J with observation J+3, keeping only the gvkeys (renamed to gvkey1 and gvkey4).  If gvkey1=gvkey4 then you know a valid 4 record window can be made.  That's the sole purpose of this merge statement.
2. Calculate WINDOW.  The first window of 4 records for a gvkey will have window=1, then 2, ...   Once there are insufficient records for a window (i.e. gvkey4^=gvkey1), it's set to a missing value.
3. Then there are 4 SET statements, with each SET being offset from the prior SET by one observation and having its own output statement, assuming window number is valid.  This populates each window with 4 sequential records.
4. The data statement make a data set VIEW instead of a data set FILE.  This means the data set won't be constructed until it is called for later.  And instead of being written to disk, it will be read directly by the calling proc (i.e. by the proc forecast).  Saves lots of disk input/output.
5. Note that this does not do anything about gaps in your annual data.  If you don't want windows with gaps then change the merge statement to:
merge firm (keep=gvkey fyear rename=(gvkey=gvkey1 fyear=fyear1))
firm (firstobs=4 keep=gvkey fyear rename=(gvkey=gvkey4 fyear=fyear4));
and change the "if window^=." statements to
if window^=. and fyear4-fyear1=3 then output;
6. The proc forecast is now "by gvkey window" instead of "by gvkey".
7. But the proc forecast issue this warning.
1. WARNING: There are too few nonmissing values available for the variable sale to initialize the
smoothed trend estimates as specified by the NSTART=8 option. Trend parameters initialized
using all 4 values available.

So I just inserted the "NSTART=MAX" option to keep the log neat.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
5 REPLIES 5
PROC Star

## Re: Calculate predictions and compare with actual values

You want to do smoothing based on 4-year rolling windows within company.  Show us the smoothing code, and we can probably show you how to make the windows for submission to that smoothing code.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Fluorite | Level 6

## Re: Calculate predictions and compare with actual values

Okay,

Thank you. I added a raw version of the code. At this junction, it will calculate the forecast for the "next" year from all the observations in the sample. I would like it to forecast a observation from 4 previous years and do this continually throughout the sample.

Best,

Pieter

PROC Star

## Re: Calculate predictions and compare with actual values

You could use proc expand or (I think) proc timedata to make rolling 4-year windows, but I don't recall the syntax off hand.  Also, you need to have a license for the SAS/ETS (econometrics/time series) package.

Here's a data step that modifies your data, so that instead of "by gvkey" you can use "by gvkey window", where window=1 for the first 4 years, window=2 for the 2nd through 5th years, etc.   That is they are rolling windows.  Here's the code:

``````DATA firm;
INPUT fyear gvkey sale;
DATALINES;
1993 1013 366.1180
1994 1013 448.7350
1995 1013 586.2220
1996 1013 828.0090
1997 1013 1164.4500
1998 1013 1379.6780
1999 1013 1926.9470
2000 1013 3287.9000
2001 1013 2402.8000
2002 1013 1047.7000
2003 1013 773.2000
2004 1013 784.3000
2005 1013 1169.2000
2006 1013 1281.9000
2007 1013 1322.2000
2008 1013 1456.4000
2009 1013 996.7000
2010 1013 1156.6000
1993 1034 338.2300
1994 1034 469.2630
1995 1034 520.8820
1996 1034 486.1840
1997 1034 500.2880
1998 1034 604.5840
1999 1034 732.4430
2000 1034 900.7940
2001 1034 974.9900
2002 1034 1230.7620
2003 1034 1297.2850
2004 1034 1339.4800
2005 1034 553.6170
2006 1034 653.8280
2007 1034 722.4250
run;

data vneed / view=vneed;
/* Get gvkey of current rec and current+3 record*/
merge firm (keep=gvkey rename=(gvkey=gvkey1))
firm (firstobs=4 keep=gvkey rename=(gvkey=gvkey4));

/* If starting a new gvkey, reset window index */
if lag(gvkey1)^=gvkey1 then window=0;
window+1;
if gvkey1^=gvkey4 then window=.;

/* Read and write 4 records for valid windows*/
set firm;              if window^=. then output;
set firm (firstobs=2); if window^=. then output;
set firm (firstobs=3); if window^=. then output;
set firm (firstobs=4); if window^=. then output;
run;

proc forecast data=vneed out=new2 lead=1 method=expo weight=.75 nstart=3;
by gvkey window;
id fyear;
var sale;
run;``````

Notes:

1. The merge statement merges observation J with observation J+3, keeping only the gvkeys (renamed to gvkey1 and gvkey4).  If gvkey1=gvkey4 then you know a valid 4 record window can be made.  That's the sole purpose of this merge statement.
2. Calculate WINDOW.  The first window of 4 records for a gvkey will have window=1, then 2, ...   Once there are insufficient records for a window (i.e. gvkey4^=gvkey1), it's set to a missing value.
3. Then there are 4 SET statements, with each SET being offset from the prior SET by one observation and having its own output statement, assuming window number is valid.  This populates each window with 4 sequential records.
4. The data statement make a data set VIEW instead of a data set FILE.  This means the data set won't be constructed until it is called for later.  And instead of being written to disk, it will be read directly by the calling proc (i.e. by the proc forecast).  Saves lots of disk input/output.
5. Note that this does not do anything about gaps in your annual data.  If you don't want windows with gaps then change the merge statement to:
merge firm (keep=gvkey fyear rename=(gvkey=gvkey1 fyear=fyear1))
firm (firstobs=4 keep=gvkey fyear rename=(gvkey=gvkey4 fyear=fyear4));
and change the "if window^=." statements to
if window^=. and fyear4-fyear1=3 then output;
6. The proc forecast is now "by gvkey window" instead of "by gvkey".
7. But the proc forecast issue this warning.
1. WARNING: There are too few nonmissing values available for the variable sale to initialize the
smoothed trend estimates as specified by the NSTART=8 option. Trend parameters initialized
using all 4 values available.

So I just inserted the "NSTART=MAX" option to keep the log neat.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Fluorite | Level 6

## Re: Calculate predictions and compare with actual values

Thank you so much for your help.

Pieter

SAS Super FREQ

## Re: Calculate predictions and compare with actual values

By definition, an exponential smoother does  not use a finite number of previous values. Instead, it uses a smoothing parameter that specifies how much weight to give to previous values. If th eparameter is alpha, then the weight of the k+th previous observation (k>1) is proportional to (1-alpha)**k. Smoothing parameters close to 1 discount earlier observations, so to get an exponential smoother that mostly relies on the most recent (4) observations, use alpha > 0.5, such as alpha near  0.66 or 0.75.

You can use PROC EXPAND to compute exponential moving averages. See the example in the article "Compute a moving average in SAS."  I also wrote an article that describes the mathematical details for rolling statistics, along with manual computations of common rolling statistics.

Discussion stats
• 5 replies
• 2065 views
• 2 likes
• 3 in conversation