BookmarkSubscribeRSS Feed
rahulsaha2127
Fluorite | Level 6

I have the following dataset:

PERIOD_DTvolumefeerate
1-Mar-192476326490522970651.850.00092761
1-Jun-192493438166423863270.190.000957043
1-Sep-192552406653324138580.040.000945718
1-Dec-192625253037124788811.260.000944245
1-Mar-202248908060124778303.20.001101793
1-Jun-202453668671623310005.110.000950006
1-Sep-202646253890927042949.40.001021933
1-Dec-202528148744528543895.470.001129043
1-Mar-213374132651432391701.020.000960001
1-Jun-213230159266031960263.680.000989433
1-Sep-213420448428637244115.760.001088867
1-Dec-213703761247035116904.140.000948142
1-Mar-224188289304929976825.330.00071573
1-Jun-223755369264932769017.550.000872591
1-Sep-223884105697732133320.610.000827303

 

I have the following equation:

pred_rate=rate_lag1 -0.68944167689726*rate_diff_lag1-0.4791311284*rate_diff_lag2;

where :

rate= volume/fee;

rate_lag1 = lag(rate);
rate_lag2=lag2(rate);
rate_diff = rate - rate_lag1;
rate_diff_lag1 = lag(rate_diff);
rate_diff_lag2=lag2(rate_diff);

 

Now I want to create pred_rate column(rate prediction) with the mentioned formula but I will need to replace actual rate with the predicted rate for all prediction after sep'20.

For eg: For sep'20 we will use the last quarter (jun'20) rate for making the prediction i.e pred_rate. From the following quarter which is Dec'20 we will use the pred_rate value of Last quarter (Sep'20) to make the calculations and use it in the pred_rate formula for calculating predictions in Dec'20 and so on.

 

Please provide your valuable inputs. TIA.

 

2 REPLIES 2
ger15xxhcker
Quartz | Level 8

You can do this by creating a 'lag' column for each of the columns in your dataset and then use the 'pred_rate' equation to calculate the predicted rate.

Here is how to do this:
1. Create the lag columns for each of the columns in the dataset. The lag column should contain the values from the previous period_dt.
2. Use the pred_rate equation to calculate the predicted rate for each period_dt.
3. Use the predicted rate for the next period_dt in the pred_rate equation instead of the actual rate.
4. Repeat steps 2 and 3 for all period_dt in the dataset.

For example, for period_dt 1-Mar-19, the equation would be:
pred_rate= rate_lag1 -0.68944167689726*rate_diff_lag1-0.4791311284*rate_diff_lag2;
where,
rate= volume/fee;
rate_lag1 = lag(rate);
rate_lag2=lag2(rate);
rate_diff = rate - rate_lag1;
rate_diff_lag1 = lag(rate_diff);
rate_diff_lag2=lag2(rate_diff);

For the next period 1-June-19, the equation will be:
pred_rate= pred_rate_lag1 -0.68944167689726*pred_rate_diff_lag1-0.4791311284*pred_rate_diff_lag2;
where,
pred_rate = predicted rate,
pred_rate_lag1 = lag(pred_rate);
pred_rate_lag2 = lag2(pred_rate);
pred_rate_diff = pred_rate - pred_rate_lag1;
pred_rate_diff_lag1 = lag(pred_rate_diff);
pred_rate_diff_lag2 = lag2(pred_rate_diff);

You can then repeat this process for all the period_dt in the dataset. If i get it right.

mkeintz
PROC Star

If I understand you correctly, you want the following for all dates prior to October, 2020:

 

data want;
  set have;

  rate= volume/fee;

  rate_lag1 = lag(rate);
  rate_lag2 = lag2(rate);
  rate_diff = dif(rate);    *Replacing rate_diff = rate - rate_lag1;
 
  rate_diff_lag1 = lag(rate_diff);
  rate_diff_lag2=lag2(rate_diff);

  pred_rate=rate_lag1 -0.68944167689726*rate_diff_lag1-0.4791311284*rate_diff_lag2;

run;

But you want to use pred_rate instead of rate (and lag of pred_rate instead of lag of rate) when DATE is after September 2020.  If so, then this should do what you want:

 

data want;
  set have;

  rate= volume/fee;

  rate_lag1 = ifn(date<'01oct2020'd,lag(rate),lag(pred_rate));
  rate_lag2 = ifn(date<'01oct2020'd,lag2(rate),lag2(pred_rate));

  rate_diff = ifn(date<'01oct2020'd,dif(rate),dif(pred_rate));

  rate_diff_lag1 = lag(rate_diff);
  rate_diff_lag2=lag2(rate_diff);

  pred_rate=rate_lag1 -0.68944167689726*rate_diff_lag1-0.4791311284*rate_diff_lag2;

run;

Instead of changing the final PRED_RATE= statement based on the date value, it changes the components based on the date value.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 908 views
  • 0 likes
  • 3 in conversation