Hi,
I am trying to calculate values based on prior values, either existing or calculated, using the lag operator. The initial calculations are based on lagged values of an existing field but latter calculations are based on lagged values of the initial calculations. So the argument of the lag operator must change. I have included the SAS code I have tried without much success. Also here is what I want to happen using Excel--I want to calculate the variable X with s, h, b, and p as given. My example has a lag of 2 but this may vary.
Thanks.
A | B | C | D | E | ||
1 | s | h | b | p | x | |
2 | 1 | -3 | 100 | 0.12 | ||
3 | 1 | -2 | 104 | 0.11 | ||
4 | 1 | -1 | 102 | 0.15 | ||
5 | 1 | 0 | 110 | 0.13 | ||
6 | 1 | 1 | 105 | 0.10 | 112.727434 | <-- =EXP(D6)*C4 |
7 | 1 | 2 | 110 | 0.15 | 127.801767 | <-- =EXP(D7)*C5 |
8 | 1 | 3 | 100 | 0.10 | 124.583081 | <-- =EXP(D8)*E6 |
9 | 2 | -3 | 150 | 0.13 | ||
10 | 2 | -2 | 165 | 0.14 | ||
11 | 2 | -1 | 170 | 0.10 | ||
12 | 2 | 0 | 160 | 0.11 | 184.185882 | |
13 | 2 | 1 | 155 | 0.15 | 197.511821 | <-- =EXP(D13)*C11 |
14 | 2 | 2 | 145 | 0.10 | 176.827347 | <-- =EXP(D14)*C12 |
15 | 2 | 3 | 150 | 0.12 | 222.693957 | <-- =EXP(D15)*E13 |
/* Sample dataset */
data temp1;
input s h b p;
datalines;
1 -6 102 0.8
1 -5 110 0.12
1 -4 105 0.15
1 -3 100 0.12
1 -2 104 0.11
1 -1 102 0.15
1 0 110 0.13
1 1 105 0.10
1 2 110 0.15
1 3 100 0.10
1 4 105 0.11
1 5 105 0.15
1 6 110 0.13
2 -6 160 0.11
2 -5 155 0.15
2 -4 140 0.13
2 -3 150 0.13
2 -2 165 0.14
2 -1 170 0.10
2 0 160 0.11
2 1 155 0.15
2 2 145 0.10
2 3 150 0.12
2 4 160 0.13
2 5 170 0.13
2 6 155 0.14
;
run;
proc print data=temp1;
by s;
run;
/* Here use 2 period lag but lag can vary */
%let d=2;
data temp2;
set temp1;
by s h;
retain x2;
if h<=&d. then x1=lag&d.(b)*exp(p); if h<=0 then x1=.;
x2=coalesce(lag&d.(x1)*exp(p),lag&d.(x2)*exp(p)); if h<=&d. then x2=.;
x=coalesce(x2,x1);
run;
proc print data=temp2;
by s;
var s h b p x1 x2 x;
run;
Please clarify the rules you are using to get lagged values.
instead of "x1=lag&d.(b)*exp(p);" try: "a=lag&d.(b); x1=a*exp(p);"
When you call the lag() function for a specific variable the first time in a data step iteration then two things happen:
1. It writes the value of current observation to a queue
2. It retrieves the lagged value from the queue
For this reason: Do not use a lag() function conditionally as else your queue gets "messed up"
http://support.sas.com/kb/24/665.html
http://support.sas.com/resources/papers/proceedings16/11221-2016.pdf
"prior values, either existing or calculated"
If you need a dynamic look-back use a look-up table. SAS hash tables for example give you a lot of flexibility.
To use prior calculated values: RETAIN such values, or use formulas where you always re-calculate everything (and look-up the required values), or write also calculated values to a SAS hash table so you can look them up anytime as required.
Since you are testing on the value of variable H, you don't need a BY statement. What you want to do is unconditionally update the lag queues, but conditionally return the lag values. That can be done by embedding LAG functions inside IFN or IFC:
/* Sample dataset */
data temp1;
input s h b p;
datalines;
1 -6 102 0.8
1 -5 110 0.12
1 -4 105 0.15
1 -3 100 0.12
1 -2 104 0.11
1 -1 102 0.15
1 0 110 0.13
1 1 105 0.10
1 2 110 0.15
1 3 100 0.10
1 4 105 0.11
1 5 105 0.15
1 6 110 0.13
2 -6 160 0.11
2 -5 155 0.15
2 -4 140 0.13
2 -3 150 0.13
2 -2 165 0.14
2 -1 170 0.10
2 0 160 0.11
2 1 155 0.15
2 2 145 0.10
2 3 150 0.12
2 4 160 0.13
2 5 170 0.13
2 6 155 0.14
;
run;
%let d=2; /* edited insertion*/
data want;
set temp1;
x1=ifn(h>0,lag&d(b)*exp(p),.);
x2=ifn(h>&d,lag&d(x1)*exp(p),.);
x=coalesce(x2,x1);
run;
And I also disagree with the advice to never put a lag function in an IF statement. There are many use cases for this technique. You can see some in my paper Leads and Lags: Static and Dynamic Queues in the SAS DATA STEP, 2nd ed.
I'd say using the lag() function ONLY within a condition is an advanced technique. For this reason I normally write to not do it as people get this likely wrong.
The OP wrote: So the argument of the lag operator must change.
When it comes to "hoping around" between observations like one can do it in Excel then I believe a look-up table is the way to go as the lag() function won't just give you access to whatever previous record dynamically.
Thank you. This is very close but does not calculate correct values for any observation where h is greater than 2*d.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.