BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
abdulla
Pyrite | Level 9
gvkeyfyearrdDesired values (KC)
100119802.52.50
100119813.15.10
100119820.74.78
100219801.21.20
100219815.56.46
1002198212.417.57
100319804.64.60
100319812.96.58
100319821.87.06
1003198312.4518.10
1003198415.5530.03
1003198516.8740.89

 

 

 

Hi, From the above data, I want to calculate values in the "Desired Value" column. 

Desired value = 0.8*lag(kc)+rd. 

Here, for each of the first observation according to gvkey, there is no kc. So, for the first gvkey the equation will be 

Desired value = 0.8*0+rd. That means, the first gvkey kc is zero. I used the following code

 

Data want;

set have;

by gvkey fyear;

retain kc;

if first.gvkey then kc=rd;

else kc = 0.8* lag(kc) +rd;

run;

 

Problem is that I am not getting the desired result. Could you please help me? What mistakes am I making? 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Each call to the LAG or DIF function creates a separate queue of values. Which means when you place one of those functions in a conditional statement the value is from the last time the condition was true.

 

Since you are RETAINing the value of KC you do not need lag though. This yields your desired KC values (less some rounding)

 

Data want;
   set have;
   by gvkey fyear;
   retain kc;
   if first.gvkey then kc=rd;
   else kc = 0.8* kc +rd;
run;

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

If you use the LAG function inside and IF statement, it doesn't behave the way you expect it to behave. So, I remove the LAG from the IF, and now it should work properly.

 

data want;
set have;
by gvkey;
retain kc;
prev_kc=lag(kc);
if first.gvkey then kc=rd;
else kc=prev_kc*0.80 + rd;
drop prev_kc;
run;

This is untested code. If you want tested code, please provide data as a SAS data step and not as a screen capture.

 

--
Paige Miller
abdulla
Pyrite | Level 9
Thank you very much
ballardw
Super User

Each call to the LAG or DIF function creates a separate queue of values. Which means when you place one of those functions in a conditional statement the value is from the last time the condition was true.

 

Since you are RETAINing the value of KC you do not need lag though. This yields your desired KC values (less some rounding)

 

Data want;
   set have;
   by gvkey fyear;
   retain kc;
   if first.gvkey then kc=rd;
   else kc = 0.8* kc +rd;
run;
abdulla
Pyrite | Level 9
Thank you very much.
mkeintz
PROC Star

@abdulla 

 

You are the victim, as so many SAS users, of the misleading naming of the lag function.  If really should be named UFQ (update fifo queue), because that's what it is.  So your statement

else kc = 0.8* lag(kc) +rd;

is NOT always fetching the prior value of KC.  Instead it is fetching the oldest value put in the queue (a queue of length one in this case, so it's also the ONLY value in the queue).  And since the queue was not updated at the start of each gvkey, the second obs of each gvkey was really retrieving the kc value from the end of the prior gvkey - because that's the last time the queue was updated, i.e. the last time the "lag" function was executed.

 

So you need a way to always update the queue, but not always use its value.  Use the IFN function as below

kd=ifn(first.gvkey,kc,0.8*lag(kc)+rd);

instead of the two statements:

if first.gvkey then kc=rd;
else kc = 0.8* lag(kc) +rd;

This works because the IFN function ALWAYS executes both outcomes (i.e. 2nd and 3rd arguments) before it chooses a result based on the first argument.

 

 

BTW there are folks who will tell you to never use the lag function in an IF statement.  I disagree - it's a great way to track interleaved series.   Say you you have two classes of KC, indicated by class='A' or class='B'.  And you actually want, for each A the most recent prior KC for class A, and the same for class B.  Then an IF statement is just the thing for the lag function:

data have;
  input class :$1.  kc @@;
datalines;
A 11   B 91   B 92   A 12   B 93   
B 94   A 13   A 14   B 95   A 15
run;
data want;
  set have;
  if class='A' then lastkc=lag(kc); else
  if class='B' then lastkc=lag(kc);
  put (_all_) (=);
run;

which produces

class=A kc=11 lastkc=.
class=B kc=91 lastkc=.
class=B kc=92 lastkc=91
class=A kc=12 lastkc=11
class=B kc=93 lastkc=92
class=B kc=94 lastkc=93
class=A kc=13 lastkc=12
class=A kc=14 lastkc=13
class=B kc=95 lastkc=94
class=A kc=15 lastkc=14

So notice all the A's have lagged value in the teens (i.e. only values for class A), and all the B's have lagged values in the 90's

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ballardw
Super User

But UFQ is so hard to pronounce!😉

mkeintz
PROC Star

How about QUP?🤔   or UPQ?🤔

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1228 views
  • 3 likes
  • 4 in conversation