I have a code that performs interpolation using the PROC EXPAND statement. I have a series where I am missing the first 3 data ouputs and so I wanted to extrapolate backward on the curve fitted using interpolation with cubic splines between the point #4 and #11. These are yield curves. I know extrapolation is not adviced but this is my best option for filling these yields.
Does anyone have suggestions how to add this? I read the SAS manuals but it only described the EXTRAPOLATE action and not how to apply it.
Thanks, Greatly appreciated
/*Step 1 Set input and output data*/ data ym.data; set work.DATA; run; /* This part of the code deals with cubic spline interpolation of the yield curves by RTTM_INT. The Proc Sort is necessary before applying the Proc Expand statement */ proc sort data=ym.yield; by RTTM_INT; run; /* New code to interpolate */ data data; format date date9.; set data; date = intnx('year','01JUN2017'd, RTTM_INT-1); run; proc expand data=data to=year out=data2; id date; convert ym_yld=interpol_yld/method=spline(natural); run;
Here is some data:
DATE RTTM_INT YM_YLD
. 4 1.492194248
. 5 1.715405997
. 6 1.839137537
. 11 3.391604044
In order to use the EXTRAPOLATE option in PROC EXPAND, the DATA= data set must include observations with missing values for the range of dates over which you want to extrapolate the fitted spline function. If you know the starting date, then you can use PROC TIMESERIES with the START= option prior to running your PROC EXPAND step. Please take a look at the following example, based on the data and code you provided, to see if this allows you to accomplish your desired result:
data data; input rttm_int ym_yld; datalines; 4 1.49 5 1.72 6 1.84 11 3.39 ; data data; format date date9.; set data; date=intnx('year','01jun2017'd,rttm_int-1); run; /* use START= option to specify starting date */ proc timeseries data=data out=test; id date interval=year start='01jan2017'd; var ym_yld rttm_int / setmissing=missing; run; proc print data=test; run; proc expand data=test to=year out=data2 extrapolate; id date; convert ym_yld=interpol_yld / method=spline(natural); convert rttm_int / method=join; run; data data2; set data2; format date date9.; rttm_int=round(rttm_int,1); run; proc print data=data2; run;
I hope this helps!
Maybe you could get it by Regression Method.
I wanted to make a correction to my earlier reply. To use PROC EXPAND with the EXTRAPOLATE option, you would not need to provide missing values for the entire range of data you want to extrapolate, but you would need to include an observation at the beginning (or end) of your DATA= data set to indicate the endpoint for the extrapolation range.
Following, please find a modified example, which augments your DATA data set with one new observation at the beginning of the data. The YM_YLD variable is set to missing for that first observation:
data data; format date date9.; input rttm_int ym_yld; date=intnx('year','01jun2017'd,rttm_int-1); datalines; 1 . 4 1.49 5 1.72 6 1.84 11 3.39 ; proc print data=data; run; proc expand data=data to=year out=data2 extrapolate; id date; convert ym_yld=interpol_yld / method=spline(natural); run; proc print data=data2; format date date9.; run;
As you correctly noted in your initial post on this topic and as mentioned in the PROC EXPAND documentation, the EXTRAPOLATE option is not generally advised and should be used with caution, since the extrapolated values might not be very reasonable. Other approaches, such as the one mentioned by Ksharp, should certainly be explored.
Hope this helps,