gvkey | fyear | rd | Desired values (KC) |
1001 | 1980 | 2.5 | 2.50 |
1001 | 1981 | 3.1 | 5.10 |
1001 | 1982 | 0.7 | 4.78 |
1002 | 1980 | 1.2 | 1.20 |
1002 | 1981 | 5.5 | 6.46 |
1002 | 1982 | 12.4 | 17.57 |
1003 | 1980 | 4.6 | 4.60 |
1003 | 1981 | 2.9 | 6.58 |
1003 | 1982 | 1.8 | 7.06 |
1003 | 1983 | 12.45 | 18.10 |
1003 | 1984 | 15.55 | 30.03 |
1003 | 1985 | 16.87 | 40.89 |
Hi, From the above data, I want to calculate values in the "Desired Value" column.
Desired value = 0.8*lag(kc)+rd.
Here, for each of the first observation according to gvkey, there is no kc. So, for the first gvkey the equation will be
Desired value = 0.8*0+rd. That means, the first gvkey kc is zero. I used the following code
Data want;
set have;
by gvkey fyear;
retain kc;
if first.gvkey then kc=rd;
else kc = 0.8* lag(kc) +rd;
run;
Problem is that I am not getting the desired result. Could you please help me? What mistakes am I making?
Each call to the LAG or DIF function creates a separate queue of values. Which means when you place one of those functions in a conditional statement the value is from the last time the condition was true.
Since you are RETAINing the value of KC you do not need lag though. This yields your desired KC values (less some rounding)
Data want; set have; by gvkey fyear; retain kc; if first.gvkey then kc=rd; else kc = 0.8* kc +rd; run;
If you use the LAG function inside and IF statement, it doesn't behave the way you expect it to behave. So, I remove the LAG from the IF, and now it should work properly.
data want;
set have;
by gvkey;
retain kc;
prev_kc=lag(kc);
if first.gvkey then kc=rd;
else kc=prev_kc*0.80 + rd;
drop prev_kc;
run;
This is untested code. If you want tested code, please provide data as a SAS data step and not as a screen capture.
Each call to the LAG or DIF function creates a separate queue of values. Which means when you place one of those functions in a conditional statement the value is from the last time the condition was true.
Since you are RETAINing the value of KC you do not need lag though. This yields your desired KC values (less some rounding)
Data want; set have; by gvkey fyear; retain kc; if first.gvkey then kc=rd; else kc = 0.8* kc +rd; run;
You are the victim, as so many SAS users, of the misleading naming of the lag function. If really should be named UFQ (update fifo queue), because that's what it is. So your statement
else kc = 0.8* lag(kc) +rd;
is NOT always fetching the prior value of KC. Instead it is fetching the oldest value put in the queue (a queue of length one in this case, so it's also the ONLY value in the queue). And since the queue was not updated at the start of each gvkey, the second obs of each gvkey was really retrieving the kc value from the end of the prior gvkey - because that's the last time the queue was updated, i.e. the last time the "lag" function was executed.
So you need a way to always update the queue, but not always use its value. Use the IFN function as below
kd=ifn(first.gvkey,kc,0.8*lag(kc)+rd);
instead of the two statements:
if first.gvkey then kc=rd;
else kc = 0.8* lag(kc) +rd;
This works because the IFN function ALWAYS executes both outcomes (i.e. 2nd and 3rd arguments) before it chooses a result based on the first argument.
BTW there are folks who will tell you to never use the lag function in an IF statement. I disagree - it's a great way to track interleaved series. Say you you have two classes of KC, indicated by class='A' or class='B'. And you actually want, for each A the most recent prior KC for class A, and the same for class B. Then an IF statement is just the thing for the lag function:
data have;
input class :$1. kc @@;
datalines;
A 11 B 91 B 92 A 12 B 93
B 94 A 13 A 14 B 95 A 15
run;
data want;
set have;
if class='A' then lastkc=lag(kc); else
if class='B' then lastkc=lag(kc);
put (_all_) (=);
run;
which produces
class=A kc=11 lastkc=.
class=B kc=91 lastkc=.
class=B kc=92 lastkc=91
class=A kc=12 lastkc=11
class=B kc=93 lastkc=92
class=B kc=94 lastkc=93
class=A kc=13 lastkc=12
class=A kc=14 lastkc=13
class=B kc=95 lastkc=94
class=A kc=15 lastkc=14
So notice all the A's have lagged value in the teens (i.e. only values for class A), and all the B's have lagged values in the 90's
But UFQ is so hard to pronounce!😉
How about QUP?🤔 or UPQ?🤔
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.