BookmarkSubscribeRSS Feed
annacole2408
Calcite | Level 5

Hello, everyone. I am trying to better understand this example: Example 16.3 Interpolating Irregular Observations :: SAS/ETS(R) 14.1 User's Guide

 

In the conversion documentation: Conversion Methods :: SAS/ETS(R) 14.1 User's Guide, it states that "For point-in-time output series, the spline function is evaluated at the appropriate points. For interval total or average output series, the spline function is integrated over the output intervals." This documentation relates to the part of Example 16.3 where the author uses the following.

proc expand data=samples
            out=monthly
            to=month
            plots=(input output);
  id date;
  convert defects / observed=(beginning,average);
run;

I understand that, without filling in the second output for observed=, the interpolation simply creates a dataset with every day between 01/13/1992 and 09/27/ 1992, then fills in any missing values with a cubic spline interpolation. Then, the monthly average estimates (Output 16.3.2) are composed of the monthly averages of the interpolated daily dataset. 

 

I also know that when using the conversion observed=(beginning, average), the process is the following: (1) create a daily dataset with every day between 01/13/1992 and 09/27/ 1992. (2) Fill in data for the missing days with "the spline function is integrated over the output intervals." (3) Compute the monthly averages of the interpolated daily dataset. 

 

I am very confused about step (2). For daily data, aren't the output intervals simply one day? Shouldn't this make both observed=(beginning, average) and observed=(beginning) the same? I have tried this data using both versions, and I get different data between the two versions of code. For the version with the conversion to average, the interpolated values do not pass through the nonmissing data. 

 

Thank you in advance.

1 REPLY 1
Rick_SAS
SAS Super FREQ

I read the documentation. I think it says the following:

  • The procedure fits a cubic spline to the data and evaluates the spline for every day.
  • If you use the option observed=(beginning), that is equivalent to using observed=(beginning,beginning).  That means that the output value is the value of the cubic spline on the first day of each month: 01JAN, 01FEB, 01MAR, etc.
  • If you use the option  observed=(beginning,average), the output value is the AVERAGE of the values in the month. Thus the output for January is the average of the cubic spline in January, the output for February is the average of the cubic spline in February, etc.

Here is the graph of the cubic interpolation function (which I got by setting TO=DAY and PLOTS=(ALL)). See if this helps you make sense of the outputs in both cases:

Rick_SAS_0-1759328775886.png