Hello, *updated*
Thank you for taking the time to read my post! I am using SAS Enterprise Guide.
I have written a code to derive bond yields using trading data. I have missing values between the remaining terms to maturity of 1 – 11.
I have decided that PROC EXPAND's default settings - cubic spline interpolation - match my needs. I would like to avoid linear interpolation. I would like to use the PROC EXPAND procedure to add some code at the end of my program to interpolate the missing values for bond yields to complete my yield curves.
The variables of interest are, yield, sector, province, currency, remaining_term_to_maturity_interval (RTTM_int which is a time interval: 1 year to maturity, 2 years to maturity ... etc used to derive the 1 year yield, 2 year yield, etc...). The issue is this: my RTTM_int variable is not a datetime variable - it is an interval I created using the maturity_date variable which is built using datetime variables (see below).
*/ parts of code where rttm_int, rttm and maturity_date are created */
If (RTTM <=1) then RTTM_INT=1;
Else if ((RTTM >1) and (RTTM <=2)) THEN RTTM_INT=2
....
Else RTTM_INT=11;
and where
data file (keep=Ref_yr Ref_mo Instrument PR N yld);
set newfile;
Ref_yr=year(REF_DATE);
Ref_mo=month(REF_DATE);
Ref_dt=day(REF_DATE);
settlement=mdy(Ref_mo, Ref_dt, Ref_yr);
MATURITY_DATE=N*365.25+REF_DATE;
Maturity_yr=year(MATURITY_DATE);
Maturity_mo=month(MATURITY_DATE);
Maturity_dt=day(MATURITY_DATE);
maturity=mdy(Maturity_mo, Maturity_dt, Maturity_yr);
run ;
DATA file;
SET newfile;
RTTM=(MATURITY_DATE-REF_DATE)/365.25;
RUN;
**What I want is to take the derived yields by sector, province and currency and interpolate the missing yields at the rttm_int points that have no yields.
Here is a sample of my code and data:
Code:
data excerpt_for_yields2;
set excerpt_for_yields;
run;
proc sort data= excerpt_for_yields2;
by rttm_int;
run;
proc expand data=excerpt_for_yields2 out=excerpt_for_yields_interpol
method=spline(natural);
by rttm_int;
convert Yield;
run;
proc print data=excerpt_for_yields_interpol;
run;
data: (excerpt_for_yields)
7 35 USA 1 1.20806
7 35 USA 3 1.01078
7 35 USA 5 1.5664
7 35 USA 6 2.53177
7 35 USA 7 2.16684
8 10 CAN 1 4.11514
8 10 CAN 8 3.22767
8 10 CAN 11 3.85838
8 11 CAN 1 2.61305
8 11 CAN 6 2.37773
8 11 CAN 7 2.6506
8 11 CAN 10 3.263
8 11 CAN 11 4.05781
8 12 CAN 1 2.37414
8 12 CAN 3 1.80637
8 12 CAN 4 2.58524
8 12 CAN 5 2.9566
8 12 CAN 6 2.29432
8 12 CAN 8 2.73651
8 12 CAN 11 3.96109
*I know this isn't working in part because RTTM is not a datetime variable and so this is not a perfect time series data set. I expect I need to somehow reformat the RTTM intervals to include datetime.... whichy I am not sure yet how to do in the best way. Am I missing something else in the PROC EXPAND statement? This is my first time using proc expand.
In addition, sometimes my missing values (missing yields) include the end points (for example, I won't have data for the intervals 6-11years). Any suggestions how best to treat this?
Thank you,
KG
The comments got cut out in the post, the first step is creating sample data with a missing data point. So I'm erasing data for the month of July in the sample data to create a data set that needs to be 'filled.
/*this is an example of time series data.
1. Create a time series data set with missing intervals (IBM)
2. Add back missing entries using PROC TIMESERIES (IBM_NO_MISSING)
3. Calculate moving average - 12 month average
*/
/*1*/
data ibm;
set sashelp.stocks;
where stock='IBM';
if month(date)=7 then
delete;
run;
proc sort data=ibm;
by date;
run;
/*2*/
proc timeseries data=ibm out=ibm_no_missing;
id date interval=month start='01Aug1986'd end='01Dec2005'd;
var open;
run;
/*3*/
proc expand data=ibm_no_missing out=want;
id date;
convert open = open_12 / method=none transformout= (movave 12);
run;
Two immediate issues:
Nobody on this board will open excel file because of the malware risk; just include a text file.
If you could provide a data step that we could run to create some fictitious data that illustrates your environment, that would make it much easier to run your code and consider alternatives.
Tom
Hi. Thank you - I added code and some data. Is this OK?
Is this what you want?
data ibm;
set sashelp.stocks;
where stock='IBM';
if month(date)=7 then
delete;
run;
proc sort data=ibm;
by date;
run;
/*2*/
proc timeseries data=ibm out=ibm_no_missing;
id date interval=month start='01Aug1986'd end='01Dec2005'd;
var open;
run;
/*3*/
proc expand data=ibm_no_missing out=want;
id date;
convert open = open_fill/ method=spline;
run;
Modified from this post:
https://gist.github.com/statgeek/07a3708dee1225ceb9d4aa75daab2c52
Thank you -
Can you explain to me the purpose of the
if month(date)=7 then
delete;
portion in the proc timeseries statement? Are you using proc timeseries to identify the interval frequency for my RTTM_int variable?
My other question is: if you understand proc expand well, by ID statement, will that need to only include my datetime (RTTM) variable?
My convert statement will be = yield?
I am still working on applying this into my case
Will get back in touch,
Thank you so much
The comments got cut out in the post, the first step is creating sample data with a missing data point. So I'm erasing data for the month of July in the sample data to create a data set that needs to be 'filled.
/*this is an example of time series data.
1. Create a time series data set with missing intervals (IBM)
2. Add back missing entries using PROC TIMESERIES (IBM_NO_MISSING)
3. Calculate moving average - 12 month average
*/
/*1*/
data ibm;
set sashelp.stocks;
where stock='IBM';
if month(date)=7 then
delete;
run;
proc sort data=ibm;
by date;
run;
/*2*/
proc timeseries data=ibm out=ibm_no_missing;
id date interval=month start='01Aug1986'd end='01Dec2005'd;
var open;
run;
/*3*/
proc expand data=ibm_no_missing out=want;
id date;
convert open = open_12 / method=none transformout= (movave 12);
run;
Can I confirm that I am applying this correctly - in general? Assuming I am using my dataset DERIVED_BOND_SET and I want to interpolate with cubic spline polynomial interpolation the variable YIELDS across all RTTM_INTs (and further by currency, province and sector - I think I can query this later) can I do the following:
/*1*/
data bondyields;
set derived_bond_set;
where stock='IBM'; what do I do for this?
if month(date)=7 then not sure how to apply this to my case of missing yields in the yield curve
delete;
run;
proc sort data=bondyields;
by RTTM_INT;
run;
/*2*/
proc timeseries data=bondyields out=complete_bondyields;
id RTTM_INT interval=year start='01Jan2007'd end='01Jan2018'd;
var YIELDS;
run;
/*3*/
proc expand data=complete_bondyields out=interpol_bondyields
id RTTM_INT;
convert YIELDS = interpolyields/ method=spline(natural);
run;
Thanks,
KG
READ THE COMMENTS.
The first step is generating sample data for the problem. You have real data so you don't need that step, start from the PROC SORT.
OH yes - I see. My apologies I understand now.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.