Dear all,
I would like to compute the total of a variable starting from t-2 to t for a company (gvkey). It seems that proc expand can perform time series total from lag variables.
My data (attached) is presented as follows:
GVKEY FYEAR TXPD
001004 1995 5.3
001004 1996 8.6
001004 1997 6.2
001004 1998 4.4
001004 1999 11.3
001004 2000 3.2
....
I would like to generate a new variable, namely sum_TXPD3. The new variable is computed by totaling the TXPD for 3 years starting from t-2 to t (i.e., t-2, t-1, and t). For example, for the company with the gvkey code 001004, the sum_TXPD3 should be blank for fyear 1995 and 1996 as there is no sufficient data to compute the variable. For the year 1997, the sum_TXPD3 should be 20.1 (5.3 + 8.6 + 6.2).
There might be some missing data in the between of the time series. For example, if the fyear 1996 is missing, then the sum_TXPD3 for fyear 1997 should be stated as ".".
Hope anyone here to help me to resolve the problem. I am not sure proc expand can help in this problem?
Thank you in advanced.
Regards,
mspak
data want;
set have;
sum_TXPD3=TXPD+lag(TXPD)+lag2(TXPD);
run;
I think something like the following will accomplish what you want:
data want;
set test.sampledata;
by gvkey;
if first.gvkey then counter=1;
else counter+1;
sum_TXPD3=ifn(counter gt 2,TXPD+lag(TXPD)+lag2(TXPD),.);
run;
An SQL alternative could be (not tested):
proc sql;
create table want as
select *, (select sum(TXPD) from have where gvkey=a.gvkey and fyear <= a.fyear having count(fyear) >=3) as rolling_sum
from have a
;
quit;
There are a number of similar questions posted in the forums over time. Searching for "rolling" provided links to interesting solutions.
I don't think an sql approach has bein proposed - perhaps because it would be impossible to expand much beyond 10 period lag.
Select a.*, (a.txpd,b.txpd,c.txpd) as sum_txpd3
from sampledata a
left join sampledata b
on a.gvkey EQ b.gvkey
and a.fyear EQ (b.fyear+1)
left join sampledata c
on a.gvkey EQ c.gvkey
and a.fyear EQ (c.fyear+2)
Thank you to all the helpers,
I think Peter's solution is the simplest perhaps. I have another problem is that not all the observations with non-missing fyear+1 and fyear+2. In order to mitigate this problem and to ensure all the gvkey with non-missing years, I perform the following:
proc sql;
create table a as
select *
from (select distinct FYEAR from SAMPLEDATA),
(select distinct GVKEY from SAMPLEDATA) ;
quit;
data want;
merge a SAMPLEDATA;
by gvkey FYEAR;
run;
However, if I wish to exclude missing observations from a given year, how should I do that? For example, if I wish to compute the sum_txpd3 for a given gvkey for year 2013, I must have non-missing txpd for year 2011, 2012 and 2013. If the data for 2012 is missing, then the sum_txpd3 should be left as ".". With the code given by Peter, the the total of txpd for years 2011 to 2013 still will be calculated even if there is one missing txpd, lets say 2012.
Thank you.
Regards,
MSPAK
Have you actually tried my suggestion? Not bragging here, but it seems to me that my code is :
1) more dynamic and flexible, you can change parameters to fit your different range, such as 10 years, 100 years etc. 2). is capable to deal with the missing value to meet your criteria. See below:
data have;
input GVKEY $ FYEAR TXPD;
cards;
001004 1995 5.3
001004 1996 8.6
001004 1997 6.2
001004 1998 4.4
001004 1999 11.3
001004 2000 .
001004 2001 3.2
001004 2002 3.2
001004 2003 3.2
001004 2004 3.2
;
proc sql;
create table want as
select *, (select sum(TXPD) from have where gvkey=a.gvkey and a.fyear-2<= fyear <= a.fyear having count(TXPD) =3) as rolling_sum
from have a
;
quit;
Only those with non-missing value for 3 continuous years are computed, else are left missing as you have requested.
Haikuo
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.