BookmarkSubscribeRSS Feed
krg1140
Calcite | Level 5

Hello everyone, me again!

 

My question that I need is:

Create a variable, PY_AUDIT_FEE, equal to the audit fees paid by the same company in the prior year.

What I need to do is lag from the prior year but I don't understand how to do that? I have something like this currently set up:

 

data mergedata3; set mergedata3;
py_audit_fees = audit_fees=lag(audit_fees);
if audit_fees ne lag(audit_fees) then audit_fees=".";
run;

 

but it's only returning a 0 or a 1 so I'm confused where I'm going wrong? Any advice would be great!

6 REPLIES 6
mkeintz
PROC Star
data mergedata3;
  set mergedata3;
  prev_audit_fees = lag(audit_fees);
  if lag(gvkey)^=gvkey then prev_audit_fees=.;
run;

This assumes that all the yearly data for a given GVKEY are consecutive.  So the only problem is when the lag function retrieves the end of the prior gvkey and applies the result to PREV_AUDIT_FEES at the beginning of the next gvkey.  That's when the IF test resets prev_audit_fees to missing 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
krg1140
Calcite | Level 5

hm okay... something is severely wrong somewhere in my code 😧

andreas_lds
Jade | Level 19

I would start by sorting the data by gvkey and year. Then use a data-step:

 

data want;
  set sorted;
  by gvkey;
  py_audit_fee = lag(py_audit_fee);
  if first.gvkey then py_audit_fee = .;
run;

The code is, of course untested. For tested code, please provide data in usable form -  a data step using datalines. Also note that

 

data mergedata3; set mergedata3;

is always a bad idea. If something goes wrong, you have to re-create "mergdata3".

 

 

ChrisNZ
Tourmaline | Level 20

 

                        Always paste the code using the appropriate icon.

 

> it's only returning a 0 or a 1 so I'm confused where I'm going wrong?

    py_audit_fees = audit_fees=lag(audit_fees) ;

The part in red is a test. So the result is always 0 or 1 depending on whether the equality is true or not.

 

 

ballardw
Super User

Another concern with the LAG function is that it is a queue function and when executed as part of an IF statement the lag is from the last time the IF was true.

Consider this code:

data example;
   input obs x y;
datalines;
1 1 2
2 2 3
3 1 4
4 2 5
5 1 7
6 1 2
7 2 99
;

data junk;
   set example;
   if x=2 then do;
      z=lag(y);
      put "observation is " obs "Lag y is " z;
   end; 
run;

You might "expect" z for the first x=1 to have the value of y from the first obs, or from obs=4 that the value of z would be 4. But we actually get:

observation is 2 Lag y is .
observation is 4 Lag y is 3
observation is 7 Lag y is 5

If you actually want the value from the immediately previous record you need to execute the LAG or DIF functions before the IF.

 

BTW if your "fees" variables are numeric, which seems extremely likely, you would set them to missing with either:

variable = .;

or

call missing (variable);

When you have a numeric variable and use var = '.'; then you will have SAS attempt to convert a character value period to numeric and generate messages like"

NOTE: Character values have been converted to numeric values at the places given by:
      (Line):(Column).

which can sometimes be a serious issue indicating your variable is of an unexpected type or a more complex syntax problem.

 

mkeintz
PROC Star

Yes, but using LAG in an IF statement is very handy when dealing with interleave series.  Consider:

 

  input store $1. date :date9. sales;
  format date date9.;
datalines;
A 31jan2020 11
B 31jan2020 111
C 31jan2020 1111
A 29feb2020 22
B 29feb2020 222
C 29feb2020 2222
A 31mar2020 33
B 31mar2020 333
C 31mar2020 3333
A 30apr2020 44
B 30apr2020 444
C 30apr2020 4444
A 31may2020 55
B 31may2020 555
C 31may2020 5555
run;
data want;
  set have;
  if store='A' then prior_sales=lag(sales); else
  if store='B' then prior_sales=lag(sales); else
  if store='C' then prior_sales=lag(sales);
run;

Each mention of the LAG function generates its own FIFO queue.  So prior_sales for store A is the last sales for A - no intemixing of prior sales from other stores.  BTW, this works even if sales records are randomly missing.

 

This is one case where testing for all three stores simultaneously, as in

 if store in ('A','B','C') then prior_sales=lag(sales);

is NOT equivalent to three separate if tests:

  if store='A' then prior_sales=lag(sales); else
  if store='B' then prior_sales=lag(sales); else
  if store='C' then prior_sales=lag(sales);

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1848 views
  • 4 likes
  • 5 in conversation