Transforming the Frequency of Time Series Data

Overview

A common problem in applied econometric work is finding data sampled with the required frequency for all variables in a model of interest. For example, you might want to use a series that is available only quarterly as input to a monthly model.

The EXPAND procedure converts time series from one sampling interval or frequency to another and interpolates missing values in time series. Using PROC EXPAND, you can collapse time series data from higher frequency intervals to lower frequency intervals, or you can expand data from lower frequency intervals to higher frequency intervals. You can also interpolate missing values in time series, either without changing series frequency or in conjunction with expanding or collapsing series. You can also convert aperiodic series, observed at arbitrary points in time, into periodic estimates.

By default, the EXPAND procedure fits cubic spline curves to the nonmissing values of variables to form continuous-time approximations of the input series. Output series are then generated from the spline approximations.

This example illustrates two applications of the transformation of the frequency of time series data.

Analysis

This example illustrates two applications of the transformation of the frequency of time series data. The first application is combining time series with different frequencies. The second is the interpolation of irregular observations.

Combining Time Series with Different Frequencies

One important use of the EXPAND procedure is to combine time series measured at different sampling frequencies. For example, suppose you have data on monthly money stocks (M1), quarterly gross domestic product (GDP), and weekly interest rates given by the Standar & Poor's weekly bond yield for long term government bonds (WSPGLT), and you want to perform an analysis of a model that uses all these variables. To perform the analysis, you need to create three data sets from the SASHELP library that contain the variables of interest. Then convert the series to a common frequency and combine the variables into one data set. You can create the three data sets with the following three DATA steps:

   data monthly;
      set sashelp.citimon;
      keep date fm1;
   run;

   data quarter;
      set sashelp.citiqtr;
      keep date gdp;
   run;

   data weekly;
      set sashelp.citiwk;
      keep date wspglt;
   run;

The following statements illustrate the conversion to a common frequency for the three data sets QUARTER, MONTHLY, and WEEKLY that are created above. The data sets QUARTER and WEEKLY are converted to monthly frequency using two PROC EXPAND steps. The OUT= option creates an output data set, the FROM= and TO= options specify the input and output intervals. The ID statement is used to specify a SAS date or datetime variable to identify the time of each input observation. The variables to be converted are listed in the CONVERT statement. The observation characteristics of series are specified with the OBSERVED= option in the CONVERT statement. When OBSERVED=TOTAL or AVERAGE, as in this example, the interpolating curve is fitted to the data values so that the area under the curve within each input interval equals the value of the series. The WSPGLT=INTEREST option in the CONVERT statement in the second step renames the variable WSPGLT to INTEREST.

proc expand data=quarter out=temp1 from=qtr to=month;
      id date;
      convert gdp / observed = total;
   run;

   proc expand data=weekly out=temp2 from=week to=month;
      id date;
      convert wspglt = interest / observed = average;
   run;

The three data sets are then merged using a DATA step MERGE statement to produce the data set COMBINED.

   data combined;
      merge monthly temp1 temp2;
      by date;
      if interest=. then delete;
   run;

Combined Data Set (first 5 observations)

Obs	DATE	FM1	GDP	interest
1	JAN1986	621.100	1441.04	9.38571
2	FEB1986	625.100	1316.58	8.75536
3	MAR1986	634.000	1458.09	7.95592
4	APR1986	641.300	1401.57	7.59766
5	MAY1986	653.100	1438.95	8.21007

See the Bivariate Granger Causality Test example for a similar use of the EXPAND procedure.

Interpolating Irregular Observations

Another important use of the EXPAND procedure is the interpolation of a series of values measured at irregular points in time. The data are hypothetical. Assume that a series of randomly timed quality control inspections are made and defect rates for a process are measured. The problem is to produce two reports: estimates of monthly average defect rates for the months within the period covered by the samples and a plot of the interpolated defect rate curve over time. The following DATA step reads the input data into the data set SAMPLES.

   data samples;
      input date : date. defects @@;
      label defects = "Defects per 1000 units";
      format date date.;
      datalines;
   13jan92    55    27jan92   73    19feb92   84    8mar92   69
   27mar92    66     5apr92   77    29apr92   63   11may92   81
   25may92    89     7jun92   94    23jun92  105   11jul92   97
   15aug92   112    29aug92   89    10sep92   77   27sep92   82
   ;

To compute the monthly estimates, use PROC EXPAND with the TO=MONTH option and specify OBSERVED=(BEGINNING,AVERAGE).

   proc expand data=samples out=monthly to=month;
      id date;
      convert defects / observed=(beginning,average);
   run;

Estimated Monthly Average Defect Rates

Obs	date	defects
1	JAN1992	59.323
2	FEB1992	82.000
3	MAR1992	66.909
4	APR1992	70.205
5	MAY1992	82.762
6	JUN1992	99.701
7	JUL1992	101.564
8	AUG1992	105.491
9	SEP1992	79.206

To produce the plot, first use PROC EXPAND with TO=DAY to interpolate a full set of daily values, naming the interpolated series INTERPOL. Then merge this data set with the samples so you can plot both the measured and the interpolated values on the same graph. The GPLOT procedure is used to plot the curve. The actual sample points are plotted with asterisks. The following statements interpolate and plot the defects rate curve:

  proc expand data=samples out=daily to=day;
      id date;
      convert defects = interpol;
   run;

   data daily;
      merge daily samples;
      by date;
   run;

   proc gplot data=daily;
      plot interpol*date  defects*date / vaxis=axis2 overlay cframe=ligr;
      title1 "Plot of Interpolated Defect Rate Curve";
      axis2 label=(angle=90);
      symbol1 c=blue interpol=join value=none;
      symbol2 c=red  interpol=none value=star;
   run;
   quit;

References

SAS Institute Inc. (1993), SAS/ETS User's Guide, Version 6, Second Edition, Cary, NC: SAS Institute Inc.