BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JKCho
Pyrite | Level 9

I have tried to find some already existing threads here but it seems that there is no such similar to calculating year-over-year growth using SAS.

I have quarterly net income data. What I want to calculate is YoY growth of net income(NI).

For example, if the data looks like as below... 

Year  QTR   FirmID   NI

2000     1       555      60

2000     2       555      61

2000     3       555      64

2001     1       555      66

2000     4       568      900

2001     4       568      945

 

The YoY of NI for firm ID 555 should be 10%. (66/60). 

The YoY of NI for firm ID 568 should be 5%. (990/900). 

 

The thing is... I first used "lag" but figured out that many observations do not have all quarters like firm ID 568 in the example.

I am thinking of something like Vlookup in Excel, but I am unsure how to do it in SAS.

 

I am still working on it but ask for your help as well.

 

Many thanks in advance!!!

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

True, the fact that your time series may have holes means you can't use a simplistic application of LAG.

 

BUT each invocation of LAG builds its own queue of values, (as opposed to a simple look-back), allowing this simple technique to do what you want:

 

data have;
  input year  QTR   FirmID   NI;
datalines;
2000     1       555      60
2000     2       555      61
2000     3       555      64
2001     1       555      66
2000     4       568      900
2001     4       568      945
run;
data want;
  set have ;
  if qtr=1 then YOY=ifn(firmid=lag(firmid),dif(ni)/lag(ni),.); else
  if qtr=2 then YOY=ifn(firmid=lag(firmid),dif(ni)/lag(ni),.); else
  if qtr=3 then YOY=ifn(firmid=lag(firmid),dif(ni)/lag(ni),.); else
  if qtr=4 then YOY=ifn(firmid=lag(firmid),dif(ni)/lag(ni),.); 
  format yoy percent5.2;
run;

BTW, the DIF(X) function is simply X-lag(X).  

 

True, ordinarily in programming, if you have

if qtr=1 then x=function(y); else
if qtr=2 then x=function(y); else
if qtr=3 then x=function(y); else
if qtr=4 then x=function(y);

you could get the same results by using

if (qtr=1 or qtr=2 or qtr=3 or qtr=4) then x=function(y);

 

But that doesn't happen with the LAG (and DIF) functions.  That's because every mention of LAG (or DIF) generates a separate queue.  So in my code, there is a separate  queue for each QTR value - just what you need for YOY.

 

You may have heard dire warnings to never use the LAG function inside the THEN clause of an IF statement.  There's a reason for those warnings, but they are simply a failure to consider LAG as a queue manager.

 

Also, regardless of the result of the first argument of the IFN function, both of its outcomes (i.e. the 2nd and 3rd arguments) are executed.  So the LAG function is always executed for a given QTR, regardless of whether FIRMID=lag(FIRMID) inside the IFN function.

 

Whenever you see LAG - think  "UPDATE FIFO QUEUE".

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
japelin
Rhodochrosite | Level 12

how about this code.

proc sql;
  create table growth as 
  select a.*
       , b.year as year2
       , b.ni as ni2
       , (ni2 / a.ni * 100) as growth from have as a
  left join have as b
  on a.firmid=b.firmid and
     a.QTR=b.QTR and
     a.year+1=b.year
  where b.year ne .
  ;
quit;
mkeintz
PROC Star

True, the fact that your time series may have holes means you can't use a simplistic application of LAG.

 

BUT each invocation of LAG builds its own queue of values, (as opposed to a simple look-back), allowing this simple technique to do what you want:

 

data have;
  input year  QTR   FirmID   NI;
datalines;
2000     1       555      60
2000     2       555      61
2000     3       555      64
2001     1       555      66
2000     4       568      900
2001     4       568      945
run;
data want;
  set have ;
  if qtr=1 then YOY=ifn(firmid=lag(firmid),dif(ni)/lag(ni),.); else
  if qtr=2 then YOY=ifn(firmid=lag(firmid),dif(ni)/lag(ni),.); else
  if qtr=3 then YOY=ifn(firmid=lag(firmid),dif(ni)/lag(ni),.); else
  if qtr=4 then YOY=ifn(firmid=lag(firmid),dif(ni)/lag(ni),.); 
  format yoy percent5.2;
run;

BTW, the DIF(X) function is simply X-lag(X).  

 

True, ordinarily in programming, if you have

if qtr=1 then x=function(y); else
if qtr=2 then x=function(y); else
if qtr=3 then x=function(y); else
if qtr=4 then x=function(y);

you could get the same results by using

if (qtr=1 or qtr=2 or qtr=3 or qtr=4) then x=function(y);

 

But that doesn't happen with the LAG (and DIF) functions.  That's because every mention of LAG (or DIF) generates a separate queue.  So in my code, there is a separate  queue for each QTR value - just what you need for YOY.

 

You may have heard dire warnings to never use the LAG function inside the THEN clause of an IF statement.  There's a reason for those warnings, but they are simply a failure to consider LAG as a queue manager.

 

Also, regardless of the result of the first argument of the IFN function, both of its outcomes (i.e. the 2nd and 3rd arguments) are executed.  So the LAG function is always executed for a given QTR, regardless of whether FIRMID=lag(FIRMID) inside the IFN function.

 

Whenever you see LAG - think  "UPDATE FIFO QUEUE".

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

Cute, but what if there are three years of data on the middle year is missing that quarter?

data have;
  input year  QTR   FirmID   NI;
datalines;
2000     1       555      60
2001     2       555      61
2001     3       555      64
2002     1       555      66
;
mkeintz
PROC Star

@Tom wrote:

Cute, but what if there are three years of data on the middle year is missing that quarter?

data have;
  input year  QTR   FirmID   NI;
datalines;
2000     1       555      60
2001     2       555      61
2001     3       555      64
2002     1       555      66
;

That s primarily a research decision.

 

Then the OP would have to decide whether to set YOY to missing, by modifying the IF statement to:

 if qtr=1 then YOY=ifn(firmid=lag(firmid) and year-1=lag(year),dif(ni)/lag(ni),.);

or else calculate an annualized YOY, as in:

if qtr=1 then ANNUALIZED_YOY=ifn(firmid=lag(firmid),(ni/lag(ni))**1/dif(year)-1,.);

 

For a single year YOY, this replaces dif(ni)/lag(ni)   with    lag(NI)/NI-1.    NI/lag(NI)  - 1

So a multiyear gap is just the N'th root of  lag(NI)/NI  NI/lag(NI)   minus 1, where N is the number of years in the gap.

 

I don't think this issue creates any difficulty in use of IF ... THEN ...LAG   constructions.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

What do you want to do when the value from one year ago does not exist?  

Kurt_Bremser
Super User

Sort the dataset to a different order, and then compare observations and calculate:

data have;
input year QTR FirmID NI;
datalines;
2000 1 555 60
2000 2 555 61
2000 3 555 64
2001 1 555 66
2000 4 568 900
2001 4 568 945
;

proc sort data=have;
by firmid qtr year;
run;

data want;
set have;
by firmid qtr;
format yoy percent6.2;
yoy = ifn(
  not first.firmid and qtr = lag(qtr) and dif(year) = 1,
  dif(ni) / lag(ni),
  .
);
run;

proc sort data=want;
by firmid year qtr;
run;

The code will take care of missing year/qtr combinations.

 

The nice effect of the IFN function is that all arguments are evaluated before the condition is applied, so the LAG and DIF functions are always executed and have their queues in order.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1625 views
  • 1 like
  • 5 in conversation