## How to interpolate missing values using PROC EXPAND statement

Solved
Occasional Contributor
Posts: 15

# How to interpolate missing values using PROC EXPAND statement

[ Edited ]

Hello, *updated*

Thank you for taking the time to read my post! I am using SAS Enterprise Guide.

I have written a code to derive bond yields using trading data. I have missing values between the remaining terms to maturity of 1 – 11.

I have decided that PROC EXPAND's default settings - cubic spline interpolation - match my needs. I would like to avoid linear interpolation. I would like to use the PROC EXPAND procedure to add some code at the end of my program to interpolate the missing values for bond yields to complete my yield curves.

The variables of interest are, yield, sector, province, currency, remaining_term_to_maturity_interval (RTTM_int which is a time interval: 1 year to maturity, 2 years to maturity ... etc used to derive the 1 year yield, 2 year yield, etc...). The issue is this: my RTTM_int variable is not a datetime variable - it is an interval I created using the maturity_date variable which is built using datetime variables (see below).

*/ parts of code where rttm_int, rttm and maturity_date are created */

If (RTTM <=1) then RTTM_INT=1;

Else if ((RTTM >1) and (RTTM <=2)) THEN RTTM_INT=2

....

Else RTTM_INT=11;

and where

data file (keep=Ref_yr Ref_mo Instrument PR N yld);

set newfile;

Ref_yr=year(REF_DATE);

Ref_mo=month(REF_DATE);

Ref_dt=day(REF_DATE);

settlement=mdy(Ref_mo, Ref_dt, Ref_yr);

MATURITY_DATE=N*365.25+REF_DATE;

Maturity_yr=year(MATURITY_DATE);

Maturity_mo=month(MATURITY_DATE);

Maturity_dt=day(MATURITY_DATE);

maturity=mdy(Maturity_mo, Maturity_dt, Maturity_yr);

run ;

DATA file;

SET newfile;

RTTM=(MATURITY_DATE-REF_DATE)/365.25;

RUN;

**What I want is to take the derived yields by sector, province and currency and interpolate the missing yields at the rttm_int points that have no yields.

Here is a sample of my code and data:

Code:

data excerpt_for_yields2;

set excerpt_for_yields;

run;

proc sort data= excerpt_for_yields2;

by rttm_int;

run;

proc expand data=excerpt_for_yields2 out=excerpt_for_yields_interpol

method=spline(natural);

by rttm_int;

convert Yield;

run;

proc print data=excerpt_for_yields_interpol;

run;

data: (excerpt_for_yields)

7              35           USA       1              1.20806

7              35           USA       3              1.01078

7              35           USA       5              1.5664

7              35           USA       6              2.53177

7              35           USA       7              2.16684

8              10           CAN       1              4.11514

8              10           CAN       8              3.22767

8              10           CAN       11           3.85838

8              11           CAN       1              2.61305

8              11           CAN       6              2.37773

8              11           CAN       7              2.6506

8              11           CAN       10           3.263

8              11           CAN       11           4.05781

8              12           CAN       1              2.37414

8              12           CAN       3              1.80637

8              12           CAN       4              2.58524

8              12           CAN       5              2.9566

8              12           CAN       6              2.29432

8              12           CAN       8              2.73651

8              12           CAN       11           3.96109

*I know this isn't working in part because RTTM is not a datetime variable and so this is not a perfect time series data set. I expect I need to somehow reformat the RTTM intervals to include datetime.... whichy I am not sure yet how to do in the best way. Am I missing something else in the PROC EXPAND statement?  This is my first time using proc expand.

In addition, sometimes my missing values (missing yields) include the end points (for example, I won't have data for the intervals 6-11years).  Any suggestions how best to treat this?

Thank you,

KG

Accepted Solutions
Solution
‎07-18-2017 12:14 PM
Super User
Posts: 19,877

## Re: How to interpolate missing values using PROC EXPAND statement

The comments got cut out in the post, the first step is creating sample data with a missing data point. So I'm erasing data for the month of July in the sample data to create a data set that needs to be 'filled.

``````/*this is an example of time series data.
1. Create a time series data set with missing intervals (IBM)
2. Add back missing entries using PROC TIMESERIES (IBM_NO_MISSING)
3. Calculate moving average - 12 month average
*/

/*1*/
data ibm;
set sashelp.stocks;
where stock='IBM';

if month(date)=7 then
delete;
run;

proc sort data=ibm;
by date;
run;

/*2*/
proc timeseries data=ibm out=ibm_no_missing;
id date interval=month start='01Aug1986'd end='01Dec2005'd;
var open;
run;

/*3*/
proc expand data=ibm_no_missing out=want;
id date;
convert open = open_12 / method=none transformout= (movave 12);
run;``````

All Replies
PROC Star
Posts: 1,167

## Re: How to interpolate missing values using PROC EXPAND statement

Two immediate issues:

Nobody on this board will open excel file because of the malware risk; just include a text file.

If you could provide a data step that we could run to create some fictitious data that illustrates your environment, that would make it much easier to run your code and consider alternatives.

Tom

Occasional Contributor
Posts: 15

## Re: How to interpolate missing values using PROC EXPAND statement

Hi. Thank you - I added code and some data. Is this OK?

Super User
Posts: 19,877

## Re: How to interpolate missing values using PROC EXPAND statement

Is this what you want?

``````data ibm;
set sashelp.stocks;
where stock='IBM';

if month(date)=7 then
delete;
run;

proc sort data=ibm;
by date;
run;

/*2*/
proc timeseries data=ibm out=ibm_no_missing;
id date interval=month start='01Aug1986'd end='01Dec2005'd;
var open;
run;

/*3*/
proc expand data=ibm_no_missing out=want;
id date;
convert open = open_fill/ method=spline;
run;``````

Modified from this post:

https://gist.github.com/statgeek/07a3708dee1225ceb9d4aa75daab2c52

Occasional Contributor
Posts: 15

## Re: How to interpolate missing values using PROC EXPAND statement

Thank you -

Can you explain to me the purpose of the

if month(date)=7 then
delete;

portion in the proc timeseries statement? Are you using proc timeseries to identify the interval frequency for my RTTM_int variable?

My other question is: if you understand proc expand well, by ID statement, will that need to only include my datetime (RTTM) variable?

My convert statement will be = yield?

I am still working on applying this into my case

Will get back in touch,

Thank you so much

Solution
‎07-18-2017 12:14 PM
Super User
Posts: 19,877

## Re: How to interpolate missing values using PROC EXPAND statement

The comments got cut out in the post, the first step is creating sample data with a missing data point. So I'm erasing data for the month of July in the sample data to create a data set that needs to be 'filled.

``````/*this is an example of time series data.
1. Create a time series data set with missing intervals (IBM)
2. Add back missing entries using PROC TIMESERIES (IBM_NO_MISSING)
3. Calculate moving average - 12 month average
*/

/*1*/
data ibm;
set sashelp.stocks;
where stock='IBM';

if month(date)=7 then
delete;
run;

proc sort data=ibm;
by date;
run;

/*2*/
proc timeseries data=ibm out=ibm_no_missing;
id date interval=month start='01Aug1986'd end='01Dec2005'd;
var open;
run;

/*3*/
proc expand data=ibm_no_missing out=want;
id date;
convert open = open_12 / method=none transformout= (movave 12);
run;``````
Occasional Contributor
Posts: 15

## Re: How to interpolate missing values using PROC EXPAND statement

Can I confirm that I am applying this correctly - in general? Assuming I am using my dataset DERIVED_BOND_SET and I want to interpolate with cubic spline polynomial interpolation the variable YIELDS across all RTTM_INTs (and further by currency, province and sector - I think I can query this later) can I do the following:

/*1*/
data bondyields;
set derived_bond_set;
where stock='IBM';    what do I do for this?

if month(date)=7 then   not sure how to apply this to my case of missing yields in the yield curve
delete;
run;

proc sort data=bondyields;
by RTTM_INT;
run;

/*2*/
proc timeseries data=bondyields out=complete_bondyields;
id RTTM_INT interval=year start='01Jan2007'd end='01Jan2018'd;
var YIELDS;
run;

/*3*/
proc expand data=complete_bondyields out=interpol_bondyields
id RTTM_INT;
convert YIELDS = interpolyields/ method=spline(natural);
run;

Thanks,

KG

Super User
Posts: 19,877

## Re: How to interpolate missing values using PROC EXPAND statement

The first step is generating sample data for the problem. You have real data so you don't need that step, start from the PROC SORT.

Occasional Contributor
Posts: 15

## Re: How to interpolate missing values using PROC EXPAND statement

OH yes - I see. My apologies I understand now.

☑ This topic is solved.