I have tried to find some already existing threads here but it seems that there is no such similar to calculating year-over-year growth using SAS.
I have quarterly net income data. What I want to calculate is YoY growth of net income(NI).
For example, if the data looks like as below...
Year QTR FirmID NI
2000 1 555 60
2000 2 555 61
2000 3 555 64
2001 1 555 66
2000 4 568 900
2001 4 568 945
The YoY of NI for firm ID 555 should be 10%. (66/60).
The YoY of NI for firm ID 568 should be 5%. (990/900).
The thing is... I first used "lag" but figured out that many observations do not have all quarters like firm ID 568 in the example.
I am thinking of something like Vlookup in Excel, but I am unsure how to do it in SAS.
I am still working on it but ask for your help as well.
Many thanks in advance!!!
True, the fact that your time series may have holes means you can't use a simplistic application of LAG.
BUT each invocation of LAG builds its own queue of values, (as opposed to a simple look-back), allowing this simple technique to do what you want:
data have;
input year QTR FirmID NI;
datalines;
2000 1 555 60
2000 2 555 61
2000 3 555 64
2001 1 555 66
2000 4 568 900
2001 4 568 945
run;
data want;
set have ;
if qtr=1 then YOY=ifn(firmid=lag(firmid),dif(ni)/lag(ni),.); else
if qtr=2 then YOY=ifn(firmid=lag(firmid),dif(ni)/lag(ni),.); else
if qtr=3 then YOY=ifn(firmid=lag(firmid),dif(ni)/lag(ni),.); else
if qtr=4 then YOY=ifn(firmid=lag(firmid),dif(ni)/lag(ni),.);
format yoy percent5.2;
run;
BTW, the DIF(X) function is simply X-lag(X).
True, ordinarily in programming, if you have
if qtr=1 then x=function(y); else
if qtr=2 then x=function(y); else
if qtr=3 then x=function(y); else
if qtr=4 then x=function(y);
you could get the same results by using
if (qtr=1 or qtr=2 or qtr=3 or qtr=4) then x=function(y);
But that doesn't happen with the LAG (and DIF) functions. That's because every mention of LAG (or DIF) generates a separate queue. So in my code, there is a separate queue for each QTR value - just what you need for YOY.
You may have heard dire warnings to never use the LAG function inside the THEN clause of an IF statement. There's a reason for those warnings, but they are simply a failure to consider LAG as a queue manager.
Also, regardless of the result of the first argument of the IFN function, both of its outcomes (i.e. the 2nd and 3rd arguments) are executed. So the LAG function is always executed for a given QTR, regardless of whether FIRMID=lag(FIRMID) inside the IFN function.
Whenever you see LAG - think "UPDATE FIFO QUEUE".
how about this code.
proc sql;
create table growth as
select a.*
, b.year as year2
, b.ni as ni2
, (ni2 / a.ni * 100) as growth from have as a
left join have as b
on a.firmid=b.firmid and
a.QTR=b.QTR and
a.year+1=b.year
where b.year ne .
;
quit;
True, the fact that your time series may have holes means you can't use a simplistic application of LAG.
BUT each invocation of LAG builds its own queue of values, (as opposed to a simple look-back), allowing this simple technique to do what you want:
data have;
input year QTR FirmID NI;
datalines;
2000 1 555 60
2000 2 555 61
2000 3 555 64
2001 1 555 66
2000 4 568 900
2001 4 568 945
run;
data want;
set have ;
if qtr=1 then YOY=ifn(firmid=lag(firmid),dif(ni)/lag(ni),.); else
if qtr=2 then YOY=ifn(firmid=lag(firmid),dif(ni)/lag(ni),.); else
if qtr=3 then YOY=ifn(firmid=lag(firmid),dif(ni)/lag(ni),.); else
if qtr=4 then YOY=ifn(firmid=lag(firmid),dif(ni)/lag(ni),.);
format yoy percent5.2;
run;
BTW, the DIF(X) function is simply X-lag(X).
True, ordinarily in programming, if you have
if qtr=1 then x=function(y); else
if qtr=2 then x=function(y); else
if qtr=3 then x=function(y); else
if qtr=4 then x=function(y);
you could get the same results by using
if (qtr=1 or qtr=2 or qtr=3 or qtr=4) then x=function(y);
But that doesn't happen with the LAG (and DIF) functions. That's because every mention of LAG (or DIF) generates a separate queue. So in my code, there is a separate queue for each QTR value - just what you need for YOY.
You may have heard dire warnings to never use the LAG function inside the THEN clause of an IF statement. There's a reason for those warnings, but they are simply a failure to consider LAG as a queue manager.
Also, regardless of the result of the first argument of the IFN function, both of its outcomes (i.e. the 2nd and 3rd arguments) are executed. So the LAG function is always executed for a given QTR, regardless of whether FIRMID=lag(FIRMID) inside the IFN function.
Whenever you see LAG - think "UPDATE FIFO QUEUE".
Cute, but what if there are three years of data on the middle year is missing that quarter?
data have;
input year QTR FirmID NI;
datalines;
2000 1 555 60
2001 2 555 61
2001 3 555 64
2002 1 555 66
;
@Tom wrote:
Cute, but what if there are three years of data on the middle year is missing that quarter?
data have; input year QTR FirmID NI; datalines; 2000 1 555 60 2001 2 555 61 2001 3 555 64 2002 1 555 66 ;
That s primarily a research decision.
Then the OP would have to decide whether to set YOY to missing, by modifying the IF statement to:
if qtr=1 then YOY=ifn(firmid=lag(firmid) and year-1=lag(year),dif(ni)/lag(ni),.);
or else calculate an annualized YOY, as in:
if qtr=1 then ANNUALIZED_YOY=ifn(firmid=lag(firmid),(ni/lag(ni))**1/dif(year)-1,.);
For a single year YOY, this replaces dif(ni)/lag(ni) with lag(NI)/NI-1. NI/lag(NI) - 1
So a multiyear gap is just the N'th root of lag(NI)/NI NI/lag(NI) minus 1, where N is the number of years in the gap.
I don't think this issue creates any difficulty in use of IF ... THEN ...LAG constructions.
What do you want to do when the value from one year ago does not exist?
Sort the dataset to a different order, and then compare observations and calculate:
data have;
input year QTR FirmID NI;
datalines;
2000 1 555 60
2000 2 555 61
2000 3 555 64
2001 1 555 66
2000 4 568 900
2001 4 568 945
;
proc sort data=have;
by firmid qtr year;
run;
data want;
set have;
by firmid qtr;
format yoy percent6.2;
yoy = ifn(
not first.firmid and qtr = lag(qtr) and dif(year) = 1,
dif(ni) / lag(ni),
.
);
run;
proc sort data=want;
by firmid year qtr;
run;
The code will take care of missing year/qtr combinations.
The nice effect of the IFN function is that all arguments are evaluated before the condition is applied, so the LAG and DIF functions are always executed and have their queues in order.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.