Contributor
Posts: 40

# Sum up lagged values

[ Edited ]

Hi,

Assume I have data set like the one below:

 ID Year Var1 Var2 123 2012 345 24 310 2012 4 324 400 2012 234 456 123 2013 675 46 200 2013 577 89 310 2013 46 45 400 2013 45 76 123 2014 123 234 200 2014 1234 35 400 2014 435 645 123 2015 231 236 200 2015 456 87 310 2015 435 534 400 2015 232 235 123 2016 123 679 200 2016 354 523 310 2016 233 754 400 2016 200 342

Now suppose, I want to create a new data set from this data set where for each year and ID I want to get the summation of the previous three years values for var 1 and var2...so that I get an output like this:

 ID Year Var1 Var2 123 2015 1143 304 200 2015 1811 124 310 2015 50 369 400 2015 714 1177 123 2016 1029 516 200 2016 2267 211 310 2016 481 579 400 2016 712 956

So for example, for ID 400 in 2016 I get the sum of the values for VAR 1 and VAR 2 in years 2015, 2014, 2013.

Notice that not in all years do all the IDs appear. Also suppose I have a large number of variables (VAR1 TO VARn) so I don't want to be typing in all their names.

Thank you!

Super User
Posts: 13,583

## Re: Sum up lagged values

You mention that not all years are present. How does that affect the sum? If the first year for an ID is 2012 and you have 2014 and 2015 but not 2013 do you want to sum 2012, 2014 and 2015 or only 2012 and 2014?

What if there are not 3 years for an ID? Especially what to do if there is only 1?

You may need to adjust your example output. You say "sum 3 years" but for ID 123 you have years 2012, 2013, 2014, 2015 and 2016. You show a desired sum of 1143 for VAR1 but that is the sum for 2012-2014 but you don't show any result for 2016 which from the description should be the sum of 2013-2015?

Contributor
Posts: 40

## Re: Sum up lagged values

Hi,

So for example the ID = 200 is missing in 2012. So the output for ID=200 in 2015 should be the sum of only 2013 and 2014. So even if an ID is missing in the previous three years the sum should take only whatever the values that are available from the previous three years.

If there is only 1 year for an ID from the previous year, the sum would be only the value of that avaialble year.

There was a slight error in the example. I adjusted it.

Hope you can help me out now to get the given output file.

Thank you!!

Super User
Posts: 23,776

## Re: Sum up lagged values

If your familiar with formats look at multilabel formats.

Otherwise use a SQL self join with the condition being

a.year - b.year between 0 and 2

Super User
Posts: 10,787

## Re: Sum up lagged values

```
data have;
infile cards truncover expandtabs;
input ID	Year	Var1	Var2;
cards;
123	2012	345	24
310	2012	4	324
400	2012	234	456
123	2013	675	46
200	2013	577	89
310	2013	46	45
400	2013	45	76
123	2014	123	234
200	2014	1234	35
400	2014	435	645
123	2015	231	236
200	2015	456	87
310	2015	435	534
400	2015	232	235
123	2016	123	679
200	2016	354	523
310	2016	233	754
400	2016	200	342
;
run;
proc sql;
create table temp as
select a.*,var1,var2
from (select * from
(select distinct id from have),(select distinct year from have)) as a
left join have as b
on a.year=b.year and a.id=b.id
order by a.id,a.year;
quit;

data want;
set temp;
by id;
array v1{0:2} _temporary_;
array v2{0:2} _temporary_;
if first.id then n=0;
n+1;
sum_v1=sum(of v1{*});
sum_v2=sum(of v2{*});
v1{mod(n,3)}=var1;
v2{mod(n,3)}=var2;
if n gt 3;
keep id year sum_:;
run;

```
Contributor
Posts: 40

## Re: Sum up lagged values

Hey thank you so much!

Instead of var1 and var2, if I have a lot of variables that I want to do the same thing as above how would you code it? So that I don't have to keep on typing the variable names...

Super User
Posts: 10,787

## Re: Sum up lagged values

```Make some macro variable to hold it .

data have;
infile cards truncover expandtabs;
input ID	Year	Var1	Var2;
cards;
123	2012	345	24
310	2012	4	324
400	2012	234	456
123	2013	675	46
200	2013	577	89
310	2013	46	45
400	2013	45	76
123	2014	123	234
200	2014	1234	35
400	2014	435	645
123	2015	231	236
200	2015	456	87
310	2015	435	534
400	2015	232	235
123	2016	123	679
200	2016	354	523
310	2016	233	754
400	2016	200	342
;
run;
proc transpose data=have(obs=0 drop=id year) out=x;
run;
proc sql noprint;
select _name_ into : names separated by ',' from x;

select catt('array _',_name_,'{0:2} _temporary_;')
into : arrays separated by ' '
from x;

select catt('sum_',_name_,'=sum(of _',_name_,'{*});')
into : sum separated by ' '
from x;

select catt('_',_name_,'{mod(n,3)}=',_name_,';') into : v separated by ' '
from x;
quit;

proc sql;
create table temp as
select a.*,&names
from (select * from
(select distinct id from have),(select distinct year from have)) as a
left join have as b
on a.year=b.year and a.id=b.id
order by a.id,a.year;
quit;

data want;
set temp;
by id;
&arrays
if first.id then n=0;
n+1;
∑
&v
if n gt 3;
keep id year sum_:;
run;

```
Super User
Posts: 10,787

## Re: Sum up lagged values

```∑   should be  & sum

```
Contributor
Posts: 40

## Re: Sum up lagged values

Thank you! Sorry to trouble you, but could you tell give me the correct code for the final part:

```data want;
set temp;
by id;
&arrays
if first.id then n=0;
n+1;
∑
&v
if n gt 3;
keep id year sum_:;
run;```

I replaced

`∑`

but the resulting output has empty values now for the summed columns.

Super User
Posts: 10,787

## Re: Sum up lagged values

It is ∑ (not blank in it)
Contributor
Posts: 40

## Re: Sum up lagged values

Sorry to bother you. Could you retype the final part:

```data want;
set temp;
by id;
&arrays
if first.id then n=0;
n+1;
∑
&v
if n gt 3;
keep id year sum_:;
run;```
Super User
Posts: 10,787

## Re: Sum up lagged values

```This forum keep eating my code.
Hope this time could work.

data have;
infile cards truncover expandtabs;
input ID	Year	Var1	Var2;
cards;
123	2012	345	24
310	2012	4	324
400	2012	234	456
123	2013	675	46
200	2013	577	89
310	2013	46	45
400	2013	45	76
123	2014	123	234
200	2014	1234	35
400	2014	435	645
123	2015	231	236
200	2015	456	87
310	2015	435	534
400	2015	232	235
123	2016	123	679
200	2016	354	523
310	2016	233	754
400	2016	200	342
;
run;
proc transpose data=have(obs=0 drop=id year) out=x;
run;
proc sql noprint;
select _name_ into : names separated by ',' from x;

select catt('array _',_name_,'{0:2} _temporary_;')
into : arrays separated by ' '
from x;

select catt('sum_',_name_,'=sum(of _',_name_,'{*});')
into : sums separated by ' '
from x;

select catt('_',_name_,'{mod(n,3)}=',_name_,';') into : v separated by ' '
from x;
quit;

proc sql;
create table temp as
select a.*,&names
from (select * from
(select distinct id from have),(select distinct year from have)) as a
left join have as b
on a.year=b.year and a.id=b.id
order by a.id,a.year;
quit;

data want;
set temp;
by id;
&arrays
if first.id then n=0;
n+1;
&sums
&v
if n gt 3;
keep id year sum_:;
run;

```
Contributor
Posts: 40

## Re: Sum up lagged values

Thank you so much! it worked now. I will try it with my original data and see now.

I truly appreciate your help. :-)

Contributor
Posts: 40

## Re: Sum up lagged values

Hi, so I ran the code with my original data. And in the final steps  I got the following error (see attachement).

Note: ID variable is termed as CUSIP.

BATEMAN_EICHLER_HILL_RICHARD was one of the variables in the original data.

I would greatly appreciate if you can give this a look.

Super User
Posts: 23,776

## Re: Sum up lagged values

The name is too long - maximum 32 chars in SAS. You'll need to truncate the prefix or your variable name somehow.

Discussion stats
• 21 replies
• 759 views
• 3 likes
• 4 in conversation