I have the following dataset:
PERIOD_DT | volume | fee | rate |
1-Mar-19 | 24763264905 | 22970651.85 | 0.00092761 |
1-Jun-19 | 24934381664 | 23863270.19 | 0.000957043 |
1-Sep-19 | 25524066533 | 24138580.04 | 0.000945718 |
1-Dec-19 | 26252530371 | 24788811.26 | 0.000944245 |
1-Mar-20 | 22489080601 | 24778303.2 | 0.001101793 |
1-Jun-20 | 24536686716 | 23310005.11 | 0.000950006 |
1-Sep-20 | 26462538909 | 27042949.4 | 0.001021933 |
1-Dec-20 | 25281487445 | 28543895.47 | 0.001129043 |
1-Mar-21 | 33741326514 | 32391701.02 | 0.000960001 |
1-Jun-21 | 32301592660 | 31960263.68 | 0.000989433 |
1-Sep-21 | 34204484286 | 37244115.76 | 0.001088867 |
1-Dec-21 | 37037612470 | 35116904.14 | 0.000948142 |
1-Mar-22 | 41882893049 | 29976825.33 | 0.00071573 |
1-Jun-22 | 37553692649 | 32769017.55 | 0.000872591 |
1-Sep-22 | 38841056977 | 32133320.61 | 0.000827303 |
I have the following equation:
pred_rate=rate_lag1 -0.68944167689726*rate_diff_lag1-0.4791311284*rate_diff_lag2;
where :
rate= volume/fee;
rate_lag1 = lag(rate);
rate_lag2=lag2(rate);
rate_diff = rate - rate_lag1;
rate_diff_lag1 = lag(rate_diff);
rate_diff_lag2=lag2(rate_diff);
Now I want to create pred_rate column(rate prediction) with the mentioned formula but I will need to replace actual rate with the predicted rate for all prediction after sep'20.
For eg: For sep'20 we will use the last quarter (jun'20) rate for making the prediction i.e pred_rate. From the following quarter which is Dec'20 we will use the pred_rate value of Last quarter (Sep'20) to make the calculations and use it in the pred_rate formula for calculating predictions in Dec'20 and so on.
Please provide your valuable inputs. TIA.
You can do this by creating a 'lag' column for each of the columns in your dataset and then use the 'pred_rate' equation to calculate the predicted rate.
Here is how to do this:
1. Create the lag columns for each of the columns in the dataset. The lag column should contain the values from the previous period_dt.
2. Use the pred_rate equation to calculate the predicted rate for each period_dt.
3. Use the predicted rate for the next period_dt in the pred_rate equation instead of the actual rate.
4. Repeat steps 2 and 3 for all period_dt in the dataset.
For example, for period_dt 1-Mar-19, the equation would be:
pred_rate= rate_lag1 -0.68944167689726*rate_diff_lag1-0.4791311284*rate_diff_lag2;
where,
rate= volume/fee;
rate_lag1 = lag(rate);
rate_lag2=lag2(rate);
rate_diff = rate - rate_lag1;
rate_diff_lag1 = lag(rate_diff);
rate_diff_lag2=lag2(rate_diff);
For the next period 1-June-19, the equation will be:
pred_rate= pred_rate_lag1 -0.68944167689726*pred_rate_diff_lag1-0.4791311284*pred_rate_diff_lag2;
where,
pred_rate = predicted rate,
pred_rate_lag1 = lag(pred_rate);
pred_rate_lag2 = lag2(pred_rate);
pred_rate_diff = pred_rate - pred_rate_lag1;
pred_rate_diff_lag1 = lag(pred_rate_diff);
pred_rate_diff_lag2 = lag2(pred_rate_diff);
You can then repeat this process for all the period_dt in the dataset. If i get it right.
If I understand you correctly, you want the following for all dates prior to October, 2020:
data want;
set have;
rate= volume/fee;
rate_lag1 = lag(rate);
rate_lag2 = lag2(rate);
rate_diff = dif(rate); *Replacing rate_diff = rate - rate_lag1;
rate_diff_lag1 = lag(rate_diff);
rate_diff_lag2=lag2(rate_diff);
pred_rate=rate_lag1 -0.68944167689726*rate_diff_lag1-0.4791311284*rate_diff_lag2;
run;
But you want to use pred_rate instead of rate (and lag of pred_rate instead of lag of rate) when DATE is after September 2020. If so, then this should do what you want:
data want;
set have;
rate= volume/fee;
rate_lag1 = ifn(date<'01oct2020'd,lag(rate),lag(pred_rate));
rate_lag2 = ifn(date<'01oct2020'd,lag2(rate),lag2(pred_rate));
rate_diff = ifn(date<'01oct2020'd,dif(rate),dif(pred_rate));
rate_diff_lag1 = lag(rate_diff);
rate_diff_lag2=lag2(rate_diff);
pred_rate=rate_lag1 -0.68944167689726*rate_diff_lag1-0.4791311284*rate_diff_lag2;
run;
Instead of changing the final PRED_RATE= statement based on the date value, it changes the components based on the date value.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.