BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
KG2
Calcite | Level 5 KG2
Calcite | Level 5

Hello, *updated*

 

Thank you for taking the time to read my post! I am using SAS Enterprise Guide.

 I have written a code to derive bond yields using trading data. I have missing values between the remaining terms to maturity of 1 – 11.

I have decided that PROC EXPAND's default settings - cubic spline interpolation - match my needs. I would like to avoid linear interpolation. I would like to use the PROC EXPAND procedure to add some code at the end of my program to interpolate the missing values for bond yields to complete my yield curves.

The variables of interest are, yield, sector, province, currency, remaining_term_to_maturity_interval (RTTM_int which is a time interval: 1 year to maturity, 2 years to maturity ... etc used to derive the 1 year yield, 2 year yield, etc...). The issue is this: my RTTM_int variable is not a datetime variable - it is an interval I created using the maturity_date variable which is built using datetime variables (see below).

 */ parts of code where rttm_int, rttm and maturity_date are created */

If (RTTM <=1) then RTTM_INT=1;

Else if ((RTTM >1) and (RTTM <=2)) THEN RTTM_INT=2

....

Else RTTM_INT=11;

 

and where

 

data file (keep=Ref_yr Ref_mo Instrument PR N yld);

set newfile;

      Ref_yr=year(REF_DATE);

      Ref_mo=month(REF_DATE);

      Ref_dt=day(REF_DATE);

      settlement=mdy(Ref_mo, Ref_dt, Ref_yr);

      MATURITY_DATE=N*365.25+REF_DATE;

      Maturity_yr=year(MATURITY_DATE);

      Maturity_mo=month(MATURITY_DATE);

      Maturity_dt=day(MATURITY_DATE);

      maturity=mdy(Maturity_mo, Maturity_dt, Maturity_yr);

  run ;

 

DATA file;

SET newfile;

      RTTM=(MATURITY_DATE-REF_DATE)/365.25;

      RUN;

 

**What I want is to take the derived yields by sector, province and currency and interpolate the missing yields at the rttm_int points that have no yields.

 

Here is a sample of my code and data:

Code:

 

data excerpt_for_yields2;

set excerpt_for_yields;

run;

 

proc sort data= excerpt_for_yields2;

by rttm_int;

run;

 

proc expand data=excerpt_for_yields2 out=excerpt_for_yields_interpol

method=spline(natural);

by rttm_int;

convert Yield;

run;

 

proc print data=excerpt_for_yields_interpol;

run;

 

 

 

data: (excerpt_for_yields)

 

7              35           USA       1              1.20806

7              35           USA       3              1.01078

7              35           USA       5              1.5664

7              35           USA       6              2.53177

7              35           USA       7              2.16684

8              10           CAN       1              4.11514

8              10           CAN       8              3.22767

8              10           CAN       11           3.85838

8              11           CAN       1              2.61305

8              11           CAN       6              2.37773

8              11           CAN       7              2.6506

8              11           CAN       10           3.263

8              11           CAN       11           4.05781

8              12           CAN       1              2.37414

8              12           CAN       3              1.80637

8              12           CAN       4              2.58524

8              12           CAN       5              2.9566

8              12           CAN       6              2.29432

8              12           CAN       8              2.73651

8              12           CAN       11           3.96109

 

 

 

 

*I know this isn't working in part because RTTM is not a datetime variable and so this is not a perfect time series data set. I expect I need to somehow reformat the RTTM intervals to include datetime.... whichy I am not sure yet how to do in the best way. Am I missing something else in the PROC EXPAND statement?  This is my first time using proc expand.

In addition, sometimes my missing values (missing yields) include the end points (for example, I won't have data for the intervals 6-11years).  Any suggestions how best to treat this?

 

Thank you,

KG

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

The comments got cut out in the post, the first step is creating sample data with a missing data point. So I'm erasing data for the month of July in the sample data to create a data set that needs to be 'filled. 

 

/*this is an example of time series data. 
1. Create a time series data set with missing intervals (IBM)
2. Add back missing entries using PROC TIMESERIES (IBM_NO_MISSING)
3. Calculate moving average - 12 month average
*/

/*1*/
data ibm;
	set sashelp.stocks;
	where stock='IBM';

	if month(date)=7 then
		delete;
run;

proc sort data=ibm;
	by date;
run;

/*2*/
proc timeseries data=ibm out=ibm_no_missing;
	id date interval=month start='01Aug1986'd end='01Dec2005'd;
	var open;
run;

/*3*/
proc expand data=ibm_no_missing out=want;
	id date;
	convert open = open_12 / method=none transformout= (movave 12);
run;

View solution in original post

8 REPLIES 8
TomKari
Onyx | Level 15

Two immediate issues:

 

Nobody on this board will open excel file because of the malware risk; just include a text file.

 

If you could provide a data step that we could run to create some fictitious data that illustrates your environment, that would make it much easier to run your code and consider alternatives.

 

Tom

KG2
Calcite | Level 5 KG2
Calcite | Level 5

Hi. Thank you - I added code and some data. Is this OK?

Reeza
Super User

Is this what you want?

 

data ibm;
	set sashelp.stocks;
	where stock='IBM';

	if month(date)=7 then
		delete;
run;

proc sort data=ibm;
	by date;
run;

/*2*/
proc timeseries data=ibm out=ibm_no_missing;
	id date interval=month start='01Aug1986'd end='01Dec2005'd;
	var open;
run;

/*3*/
proc expand data=ibm_no_missing out=want;
	id date;
	convert open = open_fill/ method=spline;
run;

Modified from this post:

https://gist.github.com/statgeek/07a3708dee1225ceb9d4aa75daab2c52

KG2
Calcite | Level 5 KG2
Calcite | Level 5

Thank you -

 

Can you explain to me the purpose of the

 

if month(date)=7 then
delete;

 

portion in the proc timeseries statement? Are you using proc timeseries to identify the interval frequency for my RTTM_int variable?

 

My other question is: if you understand proc expand well, by ID statement, will that need to only include my datetime (RTTM) variable?

My convert statement will be = yield?

 

I am still working on applying this into my case


Will get back in touch,

 

Thank you so much

Reeza
Super User

The comments got cut out in the post, the first step is creating sample data with a missing data point. So I'm erasing data for the month of July in the sample data to create a data set that needs to be 'filled. 

 

/*this is an example of time series data. 
1. Create a time series data set with missing intervals (IBM)
2. Add back missing entries using PROC TIMESERIES (IBM_NO_MISSING)
3. Calculate moving average - 12 month average
*/

/*1*/
data ibm;
	set sashelp.stocks;
	where stock='IBM';

	if month(date)=7 then
		delete;
run;

proc sort data=ibm;
	by date;
run;

/*2*/
proc timeseries data=ibm out=ibm_no_missing;
	id date interval=month start='01Aug1986'd end='01Dec2005'd;
	var open;
run;

/*3*/
proc expand data=ibm_no_missing out=want;
	id date;
	convert open = open_12 / method=none transformout= (movave 12);
run;
KG2
Calcite | Level 5 KG2
Calcite | Level 5

Can I confirm that I am applying this correctly - in general? Assuming I am using my dataset DERIVED_BOND_SET and I want to interpolate with cubic spline polynomial interpolation the variable YIELDS across all RTTM_INTs (and further by currency, province and sector - I think I can query this later) can I do the following:

 

/*1*/
data bondyields;
set derived_bond_set;
where stock='IBM';    what do I do for this?

if month(date)=7 then   not sure how to apply this to my case of missing yields in the yield curve
delete;
run;

proc sort data=bondyields;
by RTTM_INT;
run;

/*2*/
proc timeseries data=bondyields out=complete_bondyields;
id RTTM_INT interval=year start='01Jan2007'd end='01Jan2018'd;
var YIELDS;
run;

/*3*/
proc expand data=complete_bondyields out=interpol_bondyields
id RTTM_INT;
convert YIELDS = interpolyields/ method=spline(natural);
run;

 

Thanks,

KG

Reeza
Super User

READ THE COMMENTS. 

The first step is generating sample data for the problem. You have real data so you don't need that step, start from the PROC SORT.

KG2
Calcite | Level 5 KG2
Calcite | Level 5

OH yes - I see. My apologies I understand now.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 3333 views
  • 1 like
  • 3 in conversation