BookmarkSubscribeRSS Feed
rahulsaha2127
Fluorite | Level 6

I have the following dataset:

PERIOD_DTvolumefeerate
1-Mar-192476326490522970651.850.00092761
1-Jun-192493438166423863270.190.000957043
1-Sep-192552406653324138580.040.000945718
1-Dec-192625253037124788811.260.000944245
1-Mar-202248908060124778303.20.001101793
1-Jun-202453668671623310005.110.000950006
1-Sep-202646253890927042949.40.001021933
1-Dec-202528148744528543895.470.001129043
1-Mar-213374132651432391701.020.000960001
1-Jun-213230159266031960263.680.000989433
1-Sep-213420448428637244115.760.001088867
1-Dec-213703761247035116904.140.000948142
1-Mar-224188289304929976825.330.00071573
1-Jun-223755369264932769017.550.000872591
1-Sep-223884105697732133320.610.000827303

 

I have the following equation:

pred_rate=rate_lag1 -0.68944167689726*rate_diff_lag1-0.4791311284*rate_diff_lag2;

where :

rate= volume/fee;

rate_lag1 = lag(rate);
rate_lag2=lag2(rate);
rate_diff = rate - rate_lag1;
rate_diff_lag1 = lag(rate_diff);
rate_diff_lag2=lag2(rate_diff);

 

Now I want to create pred_rate column(rate prediction) with the mentioned formula but I will need to replace actual rate with the predicted rate for all prediction after sep'20.

For eg: For sep'20 we will use the last quarter (jun'20) rate for making the prediction i.e pred_rate. From the following quarter which is Dec'20 we will use the pred_rate value of Last quarter (Sep'20) to make the calculations and use it in the pred_rate formula for calculating predictions in Dec'20 and so on.

 

Please provide your valuable inputs. TIA.

 

2 REPLIES 2
ger15xxhcker
Quartz | Level 8

You can do this by creating a 'lag' column for each of the columns in your dataset and then use the 'pred_rate' equation to calculate the predicted rate.

Here is how to do this:
1. Create the lag columns for each of the columns in the dataset. The lag column should contain the values from the previous period_dt.
2. Use the pred_rate equation to calculate the predicted rate for each period_dt.
3. Use the predicted rate for the next period_dt in the pred_rate equation instead of the actual rate.
4. Repeat steps 2 and 3 for all period_dt in the dataset.

For example, for period_dt 1-Mar-19, the equation would be:
pred_rate= rate_lag1 -0.68944167689726*rate_diff_lag1-0.4791311284*rate_diff_lag2;
where,
rate= volume/fee;
rate_lag1 = lag(rate);
rate_lag2=lag2(rate);
rate_diff = rate - rate_lag1;
rate_diff_lag1 = lag(rate_diff);
rate_diff_lag2=lag2(rate_diff);

For the next period 1-June-19, the equation will be:
pred_rate= pred_rate_lag1 -0.68944167689726*pred_rate_diff_lag1-0.4791311284*pred_rate_diff_lag2;
where,
pred_rate = predicted rate,
pred_rate_lag1 = lag(pred_rate);
pred_rate_lag2 = lag2(pred_rate);
pred_rate_diff = pred_rate - pred_rate_lag1;
pred_rate_diff_lag1 = lag(pred_rate_diff);
pred_rate_diff_lag2 = lag2(pred_rate_diff);

You can then repeat this process for all the period_dt in the dataset. If i get it right.

mkeintz
PROC Star

If I understand you correctly, you want the following for all dates prior to October, 2020:

 

data want;
  set have;

  rate= volume/fee;

  rate_lag1 = lag(rate);
  rate_lag2 = lag2(rate);
  rate_diff = dif(rate);    *Replacing rate_diff = rate - rate_lag1;
 
  rate_diff_lag1 = lag(rate_diff);
  rate_diff_lag2=lag2(rate_diff);

  pred_rate=rate_lag1 -0.68944167689726*rate_diff_lag1-0.4791311284*rate_diff_lag2;

run;

But you want to use pred_rate instead of rate (and lag of pred_rate instead of lag of rate) when DATE is after September 2020.  If so, then this should do what you want:

 

data want;
  set have;

  rate= volume/fee;

  rate_lag1 = ifn(date<'01oct2020'd,lag(rate),lag(pred_rate));
  rate_lag2 = ifn(date<'01oct2020'd,lag2(rate),lag2(pred_rate));

  rate_diff = ifn(date<'01oct2020'd,dif(rate),dif(pred_rate));

  rate_diff_lag1 = lag(rate_diff);
  rate_diff_lag2=lag2(rate_diff);

  pred_rate=rate_lag1 -0.68944167689726*rate_diff_lag1-0.4791311284*rate_diff_lag2;

run;

Instead of changing the final PRED_RATE= statement based on the date value, it changes the components based on the date value.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 470 views
  • 0 likes
  • 3 in conversation