BookmarkSubscribeRSS Feed
aaou
Obsidian | Level 7

Hi,

Assume I have data set like the one below:

IDYearVar1Var2
123201234524
31020124324
4002012234456
123201367546
200201357789
31020134645
40020134576
1232014123234
2002014123435
4002014435645
1232015231236
200201545687
3102015435534
4002015232235
1232016123679
2002016354523
3102016233754
4002016200342


 Now suppose, I want to create a new data set from this data set where for each year and ID I want to get the summation of the previous three years values for var 1 and var2...so that I get an output like this:

IDYearVar1Var2
12320151143304
20020151811124
310201550369
40020157141177
12320161029516
20020162267211
3102016481579
4002016712956

 

 


 So for example, for ID 400 in 2016 I get the sum of the values for VAR 1 and VAR 2 in years 2015, 2014, 2013. 

Notice that not in all years do all the IDs appear. Also suppose I have a large number of variables (VAR1 TO VARn) so I don't want to be typing in all their names.

 

How do I go about this?

 

Thank you!

21 REPLIES 21
ballardw
Super User

You mention that not all years are present. How does that affect the sum? If the first year for an ID is 2012 and you have 2014 and 2015 but not 2013 do you want to sum 2012, 2014 and 2015 or only 2012 and 2014?

What if there are not 3 years for an ID? Especially what to do if there is only 1?

 

You may need to adjust your example output. You say "sum 3 years" but for ID 123 you have years 2012, 2013, 2014, 2015 and 2016. You show a desired sum of 1143 for VAR1 but that is the sum for 2012-2014 but you don't show any result for 2016 which from the description should be the sum of 2013-2015?

aaou
Obsidian | Level 7

Hi,

So for example the ID = 200 is missing in 2012. So the output for ID=200 in 2015 should be the sum of only 2013 and 2014. So even if an ID is missing in the previous three years the sum should take only whatever the values that are available from the previous three years.

 

If there is only 1 year for an ID from the previous year, the sum would be only the value of that avaialble year. 

 

There was a slight error in the example. I adjusted it.

 

Hope you can help me out now to get the given output file. 

 

Thank you!!

Reeza
Super User

If your familiar with formats look at multilabel formats. 

 

Otherwise use a SQL self join with the condition being 

 

a.year - b.year between 0 and 2

Ksharp
Super User

data have;
infile cards truncover expandtabs;
input ID	Year	Var1	Var2;
cards;
123	2012	345	24
310	2012	4	324
400	2012	234	456
123	2013	675	46
200	2013	577	89
310	2013	46	45
400	2013	45	76
123	2014	123	234
200	2014	1234	35
400	2014	435	645
123	2015	231	236
200	2015	456	87
310	2015	435	534
400	2015	232	235
123	2016	123	679
200	2016	354	523
310	2016	233	754
400	2016	200	342
;
run;
proc sql;
create table temp as
 select a.*,var1,var2
  from (select * from
  (select distinct id from have),(select distinct year from have)) as a
  left join have as b 
   on a.year=b.year and a.id=b.id
    order by a.id,a.year;
quit;

data want;
 set temp;
 by id;
 array v1{0:2} _temporary_;
 array v2{0:2} _temporary_;
 if first.id then n=0;
 n+1;
 sum_v1=sum(of v1{*});
 sum_v2=sum(of v2{*});
 v1{mod(n,3)}=var1;
 v2{mod(n,3)}=var2;
 if n gt 3;
 keep id year sum_:;
 run;



aaou
Obsidian | Level 7

Hey thank you so much! 

 

Instead of var1 and var2, if I have a lot of variables that I want to do the same thing as above how would you code it? So that I don't have to keep on typing the variable names...

Ksharp
Super User
Make some macro variable to hold it .

data have;
infile cards truncover expandtabs;
input ID	Year	Var1	Var2;
cards;
123	2012	345	24
310	2012	4	324
400	2012	234	456
123	2013	675	46
200	2013	577	89
310	2013	46	45
400	2013	45	76
123	2014	123	234
200	2014	1234	35
400	2014	435	645
123	2015	231	236
200	2015	456	87
310	2015	435	534
400	2015	232	235
123	2016	123	679
200	2016	354	523
310	2016	233	754
400	2016	200	342
;
run;
proc transpose data=have(obs=0 drop=id year) out=x;
run;
proc sql noprint;
select _name_ into : names separated by ',' from x;

select catt('array _',_name_,'{0:2} _temporary_;') 
       into : arrays separated by ' '
   from x;
   
select catt('sum_',_name_,'=sum(of _',_name_,'{*});')
   into : sum separated by ' ' 
   from x;
   
select catt('_',_name_,'{mod(n,3)}=',_name_,';') into : v separated by ' ' 
 from x;
quit;

proc sql;
create table temp as
 select a.*,&names
  from (select * from
  (select distinct id from have),(select distinct year from have)) as a
  left join have as b 
   on a.year=b.year and a.id=b.id
    order by a.id,a.year;
quit;

data want;
 set temp;
 by id;
 &arrays
 if first.id then n=0;
 n+1;
 ∑
 &v
 if n gt 3;
 keep id year sum_:;
 run;

Ksharp
Super User
∑   should be  & sum

aaou
Obsidian | Level 7

Thank you! Sorry to trouble you, but could you tell give me the correct code for the final part:

 

data want;
 set temp;
 by id;
 &arrays
 if first.id then n=0;
 n+1;
 ∑
 &v
 if n gt 3;
 keep id year sum_:;
 run;

I replaced  

but the resulting output has empty values now for the summed columns. 

Ksharp
Super User
It is ∑ (not blank in it)
aaou
Obsidian | Level 7

Sorry to bother you. Could you retype the final part:

 

data want;
 set temp;
 by id;
 &arrays
 if first.id then n=0;
 n+1;
 ∑
 &v
 if n gt 3;
 keep id year sum_:;
 run;
Ksharp
Super User
This forum keep eating my code. 
Hope this time could work.



data have;
infile cards truncover expandtabs;
input ID	Year	Var1	Var2;
cards;
123	2012	345	24
310	2012	4	324
400	2012	234	456
123	2013	675	46
200	2013	577	89
310	2013	46	45
400	2013	45	76
123	2014	123	234
200	2014	1234	35
400	2014	435	645
123	2015	231	236
200	2015	456	87
310	2015	435	534
400	2015	232	235
123	2016	123	679
200	2016	354	523
310	2016	233	754
400	2016	200	342
;
run;
proc transpose data=have(obs=0 drop=id year) out=x;
run;
proc sql noprint;
select _name_ into : names separated by ',' from x;

select catt('array _',_name_,'{0:2} _temporary_;') 
       into : arrays separated by ' '
   from x;
   
select catt('sum_',_name_,'=sum(of _',_name_,'{*});')
   into : sums separated by ' ' 
   from x;
   
select catt('_',_name_,'{mod(n,3)}=',_name_,';') into : v separated by ' ' 
 from x;
quit;

proc sql;
create table temp as
 select a.*,&names
  from (select * from
  (select distinct id from have),(select distinct year from have)) as a
  left join have as b 
   on a.year=b.year and a.id=b.id
    order by a.id,a.year;
quit;

data want;
 set temp;
 by id;
 &arrays
 if first.id then n=0;
 n+1;
 &sums
 &v
 if n gt 3;
 keep id year sum_:;
 run;

aaou
Obsidian | Level 7

Thank you so much! it worked now. I will try it with my original data and see now.

I truly appreciate your help. 🙂 

aaou
Obsidian | Level 7

Hi, so I ran the code with my original data. And in the final steps  I got the following error (see attachement).

 

Note: ID variable is termed as CUSIP.

BATEMAN_EICHLER_HILL_RICHARD was one of the variables in the original data.

 

I would greatly appreciate if you can give this a look.

 


Untitled.png
Reeza
Super User

The name is too long - maximum 32 chars in SAS. You'll need to truncate the prefix or your variable name somehow.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 21 replies
  • 4059 views
  • 3 likes
  • 4 in conversation