BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mspak
Quartz | Level 8

Dear all,

I would like to compute the total of a variable starting from t-2 to t for a company (gvkey). It seems that proc expand can perform time series total from lag variables.

My data (attached)  is presented as follows:

GVKEY    FYEAR   TXPD

001004     1995       5.3

001004     1996       8.6

001004     1997       6.2

001004     1998       4.4

001004     1999      11.3

001004     2000       3.2

....

I would like to generate a new variable, namely sum_TXPD3. The new variable is computed by totaling the TXPD for 3 years starting from t-2 to t (i.e., t-2, t-1, and t). For example, for the company with the gvkey code 001004, the  sum_TXPD3 should be blank for fyear 1995 and 1996 as there is no sufficient data to compute the variable. For the year 1997, the sum_TXPD3 should be 20.1 (5.3 + 8.6 + 6.2).

There might be some missing data in the between of the time series. For example, if the fyear 1996 is missing, then the sum_TXPD3 for fyear 1997 should be stated as ".".

Hope anyone here to help me to resolve the problem. I am not sure proc expand can help in this problem?

Thank you in advanced.

Regards,

mspak

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

I hear ya. Smiley Wink  reminded me that my solution was flawed, and it was, here is another try:

proc sql;

create table want as

select *, (select sum(TXPD) from have where gvkey=a.gvkey and a.fear-2<= fyear <= a.fyear having count(TXPD) =3) as rolling_sum

from have a

;

quit;

View solution in original post

7 REPLIES 7
stat_sas
Ammonite | Level 13

data want;

set have;

sum_TXPD3=TXPD+lag(TXPD)+lag2(TXPD);

run;

art297
Opal | Level 21

I think something like the following will accomplish what you want:

data want;

  set test.sampledata;

  by gvkey;

  if first.gvkey then counter=1;

  else counter+1;

  sum_TXPD3=ifn(counter gt 2,TXPD+lag(TXPD)+lag2(TXPD),.);

run;

Haikuo
Onyx | Level 15

An SQL alternative could be (not tested):

proc sql;

create table want as

select *, (select sum(TXPD) from have where gvkey=a.gvkey and fyear <= a.fyear having count(fyear) >=3) as rolling_sum

from have a

;

quit;

Peter_C
Rhodochrosite | Level 12

There are a number of similar questions posted in the forums over time. Searching for "rolling" provided links to interesting solutions.

I don't think an sql approach has bein proposed - perhaps because it would be impossible to expand much beyond 10 period lag.

Select a.*, (a.txpd,b.txpd,c.txpd) as sum_txpd3

from sampledata a

left join sampledata b

on  a.gvkey EQ b.gvkey

and a.fyear EQ (b.fyear+1)

left join sampledata c

on a.gvkey EQ c.gvkey

and a.fyear EQ (c.fyear+2)

Haikuo
Onyx | Level 15

I hear ya. Smiley Wink  reminded me that my solution was flawed, and it was, here is another try:

proc sql;

create table want as

select *, (select sum(TXPD) from have where gvkey=a.gvkey and a.fear-2<= fyear <= a.fyear having count(TXPD) =3) as rolling_sum

from have a

;

quit;

mspak
Quartz | Level 8

Thank you to all the helpers,

I think Peter's solution is the simplest perhaps. I have another problem is that not all the observations with non-missing fyear+1 and fyear+2. In order to mitigate this problem and to ensure all the gvkey with non-missing years, I perform the following:

proc sql;

create table a as

select *

  from (select distinct FYEAR from SAMPLEDATA),

(select distinct GVKEY from SAMPLEDATA) ;

quit;

data want;

merge a SAMPLEDATA;

by  gvkey  FYEAR;

run;

However, if I wish to exclude missing observations from a given year, how should I do that? For example, if I wish to compute the sum_txpd3 for a given gvkey for year 2013, I must have non-missing txpd for year 2011, 2012 and 2013. If the data for 2012 is missing, then the sum_txpd3 should be left as ".". With the code given by Peter, the the total of txpd for years 2011 to 2013 still will be calculated even if there is one missing txpd, lets say 2012.

Thank you.

Regards,

MSPAK

Haikuo
Onyx | Level 15

Have you actually tried my suggestion? Not bragging here, but it seems to me that my code is :

1)  more dynamic and flexible, you can change parameters to fit your different range, such as 10 years, 100 years etc. 2). is capable to deal with the missing value to meet your criteria. See below:

data have;

     input GVKEY $   FYEAR TXPD;

     cards;

001004     1995 5.3

001004     1996 8.6

001004     1997 6.2

001004     1998 4.4

001004     1999 11.3

001004     2000 .

001004     2001 3.2

001004     2002 3.2

001004     2003 3.2

001004     2004 3.2

;

proc sql;

     create table want as

           select *, (select sum(TXPD) from have where gvkey=a.gvkey and a.fyear-2<= fyear <= a.fyear having count(TXPD) =3) as rolling_sum

                from have a

     ;

quit;


Only those with non-missing value for 3 continuous years are computed, else are left missing as you have requested.


Haikuo

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2138 views
  • 4 likes
  • 5 in conversation