I have the following dataset:
PERIOD_DT | volume | fee | rate |
1-Mar-19 | 24763264905 | 22970651.85 | 0.00092761 |
1-Jun-19 | 24934381664 | 23863270.19 | 0.000957043 |
1-Sep-19 | 25524066533 | 24138580.04 | 0.000945718 |
1-Dec-19 | 26252530371 | 24788811.26 | 0.000944245 |
1-Mar-20 | 22489080601 | 24778303.2 | 0.001101793 |
1-Jun-20 | 24536686716 | 23310005.11 | 0.000950006 |
1-Sep-20 | 26462538909 | 27042949.4 | 0.001021933 |
1-Dec-20 | 25281487445 | 28543895.47 | 0.001129043 |
1-Mar-21 | 33741326514 | 32391701.02 | 0.000960001 |
1-Jun-21 | 32301592660 | 31960263.68 | 0.000989433 |
1-Sep-21 | 34204484286 | 37244115.76 | 0.001088867 |
1-Dec-21 | 37037612470 | 35116904.14 | 0.000948142 |
1-Mar-22 | 41882893049 | 29976825.33 | 0.00071573 |
1-Jun-22 | 37553692649 | 32769017.55 | 0.000872591 |
1-Sep-22 | 38841056977 | 32133320.61 | 0.000827303 |
I have the following equation:
pred_rate=rate_lag1 -0.68944167689726*rate_diff_lag1-0.4791311284*rate_diff_lag2;
where :
rate= volume/fee;
rate_lag1 = lag(rate);
rate_lag2=lag2(rate);
rate_diff = rate - rate_lag1;
rate_diff_lag1 = lag(rate_diff);
rate_diff_lag2=lag2(rate_diff);
Now I want to create pred_rate column(rate prediction) with the mentioned formula but I will need to replace actual rate with the predicted rate for all prediction after sep'20.
For eg: For sep'20 we will use the last quarter (jun'20) rate for making the prediction i.e pred_rate. From the following quarter which is Dec'20 we will use the pred_rate value of Last quarter (Sep'20) to make the calculations and use it in the pred_rate formula for calculating predictions in Dec'20 and so on.
Please provide your valuable inputs. TIA.
You can do this by creating a 'lag' column for each of the columns in your dataset and then use the 'pred_rate' equation to calculate the predicted rate.
Here is how to do this:
1. Create the lag columns for each of the columns in the dataset. The lag column should contain the values from the previous period_dt.
2. Use the pred_rate equation to calculate the predicted rate for each period_dt.
3. Use the predicted rate for the next period_dt in the pred_rate equation instead of the actual rate.
4. Repeat steps 2 and 3 for all period_dt in the dataset.
For example, for period_dt 1-Mar-19, the equation would be:
pred_rate= rate_lag1 -0.68944167689726*rate_diff_lag1-0.4791311284*rate_diff_lag2;
where,
rate= volume/fee;
rate_lag1 = lag(rate);
rate_lag2=lag2(rate);
rate_diff = rate - rate_lag1;
rate_diff_lag1 = lag(rate_diff);
rate_diff_lag2=lag2(rate_diff);
For the next period 1-June-19, the equation will be:
pred_rate= pred_rate_lag1 -0.68944167689726*pred_rate_diff_lag1-0.4791311284*pred_rate_diff_lag2;
where,
pred_rate = predicted rate,
pred_rate_lag1 = lag(pred_rate);
pred_rate_lag2 = lag2(pred_rate);
pred_rate_diff = pred_rate - pred_rate_lag1;
pred_rate_diff_lag1 = lag(pred_rate_diff);
pred_rate_diff_lag2 = lag2(pred_rate_diff);
You can then repeat this process for all the period_dt in the dataset. If i get it right.
If I understand you correctly, you want the following for all dates prior to October, 2020:
data want;
set have;
rate= volume/fee;
rate_lag1 = lag(rate);
rate_lag2 = lag2(rate);
rate_diff = dif(rate); *Replacing rate_diff = rate - rate_lag1;
rate_diff_lag1 = lag(rate_diff);
rate_diff_lag2=lag2(rate_diff);
pred_rate=rate_lag1 -0.68944167689726*rate_diff_lag1-0.4791311284*rate_diff_lag2;
run;
But you want to use pred_rate instead of rate (and lag of pred_rate instead of lag of rate) when DATE is after September 2020. If so, then this should do what you want:
data want;
set have;
rate= volume/fee;
rate_lag1 = ifn(date<'01oct2020'd,lag(rate),lag(pred_rate));
rate_lag2 = ifn(date<'01oct2020'd,lag2(rate),lag2(pred_rate));
rate_diff = ifn(date<'01oct2020'd,dif(rate),dif(pred_rate));
rate_diff_lag1 = lag(rate_diff);
rate_diff_lag2=lag2(rate_diff);
pred_rate=rate_lag1 -0.68944167689726*rate_diff_lag1-0.4791311284*rate_diff_lag2;
run;
Instead of changing the final PRED_RATE= statement based on the date value, it changes the components based on the date value.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.