|
A common problem in applied econometric work is finding data sampled with the required frequency for all variables in a model of interest. For example, you might want to use a series that is available only quarterly as input to a monthly model.
The EXPAND procedure converts time series from one sampling interval or frequency to another and interpolates missing values in time series. Using PROC EXPAND, you can collapse time series data from higher frequency intervals to lower frequency intervals, or you can expand data from lower frequency intervals to higher frequency intervals. You can also interpolate missing values in time series, either without changing series frequency or in conjunction with expanding or collapsing series. You can also convert aperiodic series, observed at arbitrary points in time, into periodic estimates.
By default, the EXPAND procedure fits cubic spline curves to the nonmissing values of variables to form continuous-time approximations of the input series. Output series are then generated from the spline approximations.
This example illustrates two applications of the transformation of the frequency of time series data.
This example illustrates two applications of the transformation of the frequency of time series data. The first application is combining time series with different frequencies. The second is the interpolation of irregular observations.
One important use of the EXPAND procedure is to combine time series measured at different sampling frequencies. For example, suppose you have data on monthly money stocks (M1), quarterly gross domestic product (GDP), and weekly interest rates given by the Standar & Poor's weekly bond yield for long term government bonds (WSPGLT), and you want to perform an analysis of a model that uses all these variables. To perform the analysis, you need to create three data sets from the SASHELP library that contain the variables of interest. Then convert the series to a common frequency and combine the variables into one data set. You can create the three data sets with the following three DATA steps:
data monthly;
set sashelp.citimon;
keep date fm1;
run;
data quarter;
set sashelp.citiqtr;
keep date gdp;
run;
data weekly;
set sashelp.citiwk;
keep date wspglt;
run;
The following statements illustrate the conversion to a common frequency for the three data sets QUARTER, MONTHLY, and WEEKLY that are created above. The data sets QUARTER and WEEKLY are converted to monthly frequency using two PROC EXPAND steps. The OUT= option creates an output data set, the FROM= and TO= options specify the input and output intervals. The ID statement is used to specify a SAS date or datetime variable to identify the time of each input observation. The variables to be converted are listed in the CONVERT statement. The observation characteristics of series are specified with the OBSERVED= option in the CONVERT statement. When OBSERVED=TOTAL or AVERAGE, as in this example, the interpolating curve is fitted to the data values so that the area under the curve within each input interval equals the value of the series. The WSPGLT=INTEREST option in the CONVERT statement in the second step renames the variable WSPGLT to INTEREST.
proc expand data=quarter out=temp1 from=qtr to=month;
id date;
convert gdp / observed = total;
run;
proc expand data=weekly out=temp2 from=week to=month;
id date;
convert wspglt = interest / observed = average;
run;
The three data sets are then merged using a DATA step MERGE statement to produce the data set COMBINED.
data combined;
merge monthly temp1 temp2;
by date;
if interest=. then delete;
run;
|
See the Bivariate Granger Causality Test example for a similar use of the EXPAND procedure.
Another important use of the EXPAND procedure is the interpolation of a series of values measured at irregular points in time. The data are hypothetical. Assume that a series of randomly timed quality control inspections are made and defect rates for a process are measured. The problem is to produce two reports: estimates of monthly average defect rates for the months within the period covered by the samples and a plot of the interpolated defect rate curve over time. The following DATA step reads the input data into the data set SAMPLES.
data samples;
input date : date. defects @@;
label defects = "Defects per 1000 units";
format date date.;
datalines;
13jan92 55 27jan92 73 19feb92 84 8mar92 69
27mar92 66 5apr92 77 29apr92 63 11may92 81
25may92 89 7jun92 94 23jun92 105 11jul92 97
15aug92 112 29aug92 89 10sep92 77 27sep92 82
;
To compute the monthly estimates, use PROC EXPAND with the TO=MONTH option and specify OBSERVED=(BEGINNING,AVERAGE).
proc expand data=samples out=monthly to=month;
id date;
convert defects / observed=(beginning,average);
run;
|
To produce the plot, first use PROC EXPAND with TO=DAY to interpolate a full set of daily values, naming the interpolated series INTERPOL. Then merge this data set with the samples so you can plot both the measured and the interpolated values on the same graph. The GPLOT procedure is used to plot the curve. The actual sample points are plotted with asterisks. The following statements interpolate and plot the defects rate curve:
proc expand data=samples out=daily to=day;
id date;
convert defects = interpol;
run;
data daily;
merge daily samples;
by date;
run;
proc gplot data=daily;
plot interpol*date defects*date / vaxis=axis2 overlay cframe=ligr;
title1 "Plot of Interpolated Defect Rate Curve";
axis2 label=(angle=90);
symbol1 c=blue interpol=join value=none;
symbol2 c=red interpol=none value=star;
run;
quit;
SAS Institute Inc. (1993), SAS/ETS User's Guide, Version 6, Second Edition, Cary, NC: SAS Institute Inc.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Ready to level-up your skills? Choose your own adventure.