Help using Base SAS procedures

Sum of variables from t-2 to t

Accepted Solution Solved
Reply
Regular Contributor
Posts: 162
Accepted Solution

Sum of variables from t-2 to t

Dear all,

I would like to compute the total of a variable starting from t-2 to t for a company (gvkey). It seems that proc expand can perform time series total from lag variables.

My data (attached)  is presented as follows:

GVKEY    FYEAR   TXPD

001004     1995       5.3

001004     1996       8.6

001004     1997       6.2

001004     1998       4.4

001004     1999      11.3

001004     2000       3.2

....

I would like to generate a new variable, namely sum_TXPD3. The new variable is computed by totaling the TXPD for 3 years starting from t-2 to t (i.e., t-2, t-1, and t). For example, for the company with the gvkey code 001004, the  sum_TXPD3 should be blank for fyear 1995 and 1996 as there is no sufficient data to compute the variable. For the year 1997, the sum_TXPD3 should be 20.1 (5.3 + 8.6 + 6.2).

There might be some missing data in the between of the time series. For example, if the fyear 1996 is missing, then the sum_TXPD3 for fyear 1997 should be stated as ".".

Hope anyone here to help me to resolve the problem. I am not sure proc expand can help in this problem?

Thank you in advanced.

Regards,

mspak

Attachment

Accepted Solutions
Solution
‎11-30-2014 07:17 PM
Respected Advisor
Posts: 3,147

Re: Sum of variables from t-2 to t

I hear ya. Smiley Wink  reminded me that my solution was flawed, and it was, here is another try:

proc sql;

create table want as

select *, (select sum(TXPD) from have where gvkey=a.gvkey and a.fear-2<= fyear <= a.fyear having count(TXPD) =3) as rolling_sum

from have a

;

quit;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,222

Re: Sum of variables from t-2 to t

data want;

set have;

sum_TXPD3=TXPD+lag(TXPD)+lag2(TXPD);

run;

PROC Star
Posts: 7,416

Re: Sum of variables from t-2 to t

I think something like the following will accomplish what you want:

data want;

  set test.sampledata;

  by gvkey;

  if first.gvkey then counter=1;

  else counter+1;

  sum_TXPD3=ifn(counter gt 2,TXPD+lag(TXPD)+lag2(TXPD),.);

run;

Respected Advisor
Posts: 3,147

Re: Sum of variables from t-2 to t

An SQL alternative could be (not tested):

proc sql;

create table want as

select *, (select sum(TXPD) from have where gvkey=a.gvkey and fyear <= a.fyear having count(fyear) >=3) as rolling_sum

from have a

;

quit;

Valued Guide
Posts: 2,175

Re: Sum of variables from t-2 to t

There are a number of similar questions posted in the forums over time. Searching for "rolling" provided links to interesting solutions.

I don't think an sql approach has bein proposed - perhaps because it would be impossible to expand much beyond 10 period lag.

Select a.*, (a.txpd,b.txpd,c.txpd) as sum_txpd3

from sampledata a

left join sampledata b

on  a.gvkey EQ b.gvkey

and a.fyear EQ (b.fyear+1)

left join sampledata c

on a.gvkey EQ c.gvkey

and a.fyear EQ (c.fyear+2)

Solution
‎11-30-2014 07:17 PM
Respected Advisor
Posts: 3,147

Re: Sum of variables from t-2 to t

I hear ya. Smiley Wink  reminded me that my solution was flawed, and it was, here is another try:

proc sql;

create table want as

select *, (select sum(TXPD) from have where gvkey=a.gvkey and a.fear-2<= fyear <= a.fyear having count(TXPD) =3) as rolling_sum

from have a

;

quit;

Regular Contributor
Posts: 162

Re: Sum of variables from t-2 to t

Thank you to all the helpers,

I think Peter's solution is the simplest perhaps. I have another problem is that not all the observations with non-missing fyear+1 and fyear+2. In order to mitigate this problem and to ensure all the gvkey with non-missing years, I perform the following:

proc sql;

create table a as

select *

  from (select distinct FYEAR from SAMPLEDATA),

(select distinct GVKEY from SAMPLEDATA) ;

quit;

data want;

merge a SAMPLEDATA;

by  gvkey  FYEAR;

run;

However, if I wish to exclude missing observations from a given year, how should I do that? For example, if I wish to compute the sum_txpd3 for a given gvkey for year 2013, I must have non-missing txpd for year 2011, 2012 and 2013. If the data for 2012 is missing, then the sum_txpd3 should be left as ".". With the code given by Peter, the the total of txpd for years 2011 to 2013 still will be calculated even if there is one missing txpd, lets say 2012.

Thank you.

Regards,

MSPAK

Respected Advisor
Posts: 3,147

Re: Sum of variables from t-2 to t

Have you actually tried my suggestion? Not bragging here, but it seems to me that my code is :

1)  more dynamic and flexible, you can change parameters to fit your different range, such as 10 years, 100 years etc. 2). is capable to deal with the missing value to meet your criteria. See below:

data have;

     input GVKEY $   FYEAR TXPD;

     cards;

001004     1995 5.3

001004     1996 8.6

001004     1997 6.2

001004     1998 4.4

001004     1999 11.3

001004     2000 .

001004     2001 3.2

001004     2002 3.2

001004     2003 3.2

001004     2004 3.2

;

proc sql;

     create table want as

           select *, (select sum(TXPD) from have where gvkey=a.gvkey and a.fyear-2<= fyear <= a.fyear having count(TXPD) =3) as rolling_sum

                from have a

     ;

quit;


Only those with non-missing value for 3 continuous years are computed, else are left missing as you have requested.


Haikuo

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 488 views
  • 4 likes
  • 5 in conversation