BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ohsg94
Fluorite | Level 6

Hello everyone, 

I'm trying to count a variable called "EI_D" that values 1 for last 5 years. 

Here is my dataset. 

Year Code EI_D

2010   A      1

2011   A      0

2012   A      1

2013   A      1

2014   A      0

2015   A      1

2016   A      0

2017   A      0

2018   A      1

2014   B     1

2015   B     1

2016   B     0

2017   B     1

2018   B     0

 

I want my dataset looking like below.

Year Code EI_D Count

2010   A      1       

2011   A      0       

2012   A      1       

2013   A      1       

2014   A      0       3

2015   A      1       3   (Since I want to count EI_D that is 1 for last 5 years, I do not include 2010)

2016   A      0       3

2017   A      0       2

2018   A      1       2

2014   B     1        

2015   B     1

2016   B     0

2017   B     1

2018   B     0       3

 

I tried several types of PROC SQL code but they didn't work for me. 

I truly appreciate everyone's help...

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
  input Year Code $ EI_D;
  datalines;
2010 A 1
2011 A 0
2012 A 1
2013 A 1
2014 A 0
2015 A 1
2016 A 0
2017 A 0
2018 A 1
2014 B 1
2015 B 1
2016 B 0
2017 B 1
2018 B 0
;

data want;
 if _n_=1 then do;
   if 0 then set have(rename=(EI_D=_EI_D));
   declare hash h(dataset:'have(rename=(EI_D=_EI_D))',hashexp:20);
   h.definekey('year','code');
   h.definedata('_EI_D');
   h.definedone();
 end;
set have;
 _n=0;count=0;
 do _year=year-4 to year;
  if h.find(key:_year,key:code)=0 then do;_n+1;count+_EI_D; end;
 end;
 if _n ne 5 then count=.;
 drop _n _year _EI_D;
run;

View solution in original post

8 REPLIES 8
Patrick
Opal | Level 21

If you search the SAS forums for rolling sum or rolling count you'll find quite a few similar Q&A.

Below one coding option using your sample data.

Depending on your actual data types and if year is just a number or a SAS date value some amendments to the posted code will be required.

data have;
  input Year Code $ EI_D;
  datalines;
2010 A 1
2011 A 0
2012 A 1
2013 A 1
2014 A 0
2015 A 1
2016 A 0
2017 A 0
2018 A 1
2014 B 1
2015 B 1
2016 B 0
2017 B 1
2018 B 0
;

proc sql;
  select distinct
    t1.*,
    case
      when count(*) >= 5 then sum(t2.EI_D)
      else .
      end as sum_ei_d
  from 
    have t1
    left join
    have t2
    on t1.code=t2.code and t2.year between t1.year-4 and t1.year
  group by 
    t1.code, t1.year
  order by
    t1.code, t1.year
  ;
quit;
PaigeMiller
Diamond | Level 26

If you have PROC EXPAND, then this is relatively easy, its called a moving sum of the 5 observations. You can use the MOVSUM 5 operator. An example (for moving average) can be found here: https://documentation.sas.com/doc/en/etsug/15.2/etsug_expand_details19.htm#etsug_expand002903 (scroll down)

--
Paige Miller
Kurt_Bremser
Super User

Rolling sum in a data step:

data want;
set have;
by code year; /* year to assure correct sort order */
array y {0:4} _temporary_;
if first.code
then do;
  do i = 0 to 4; /* can be omitted, because we only calculate sums after the first 5 years */
    y{i} = 0;
  end;
  i = 1;
end;
else i + 1;
y{mod(_n_,5)} = ei_d;
if i ge 5 then count = sum(of y{*});
drop i;
run;

Untested, posted from my tablet.

ohsg94
Fluorite | Level 6
This worked for me. Thanks for your help.
Ksharp
Super User
data have;
  input Year Code $ EI_D;
  datalines;
2010 A 1
2011 A 0
2012 A 1
2013 A 1
2014 A 0
2015 A 1
2016 A 0
2017 A 0
2018 A 1
2014 B 1
2015 B 1
2016 B 0
2017 B 1
2018 B 0
;

data want;
 if _n_=1 then do;
   if 0 then set have(rename=(EI_D=_EI_D));
   declare hash h(dataset:'have(rename=(EI_D=_EI_D))',hashexp:20);
   h.definekey('year','code');
   h.definedata('_EI_D');
   h.definedone();
 end;
set have;
 _n=0;count=0;
 do _year=year-4 to year;
  if h.find(key:_year,key:code)=0 then do;_n+1;count+_EI_D; end;
 end;
 if _n ne 5 then count=.;
 drop _n _year _EI_D;
run;
ohsg94
Fluorite | Level 6
I tried it and it worked for me. Thanks a lot.
mkeintz
PROC Star

You want rolling sums of EI_D for "windows" of size 5.  If you have exactly one record per year, and no holes, then:LAG is efficient:

 

data want;
  set have;
  count=sum(lag4(ei_d),lag3(ei_d),lag2(ei_d),lag1(ei_d),ei_d);
  if lag4(code)^=code then count=.;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1117 views
  • 9 likes
  • 7 in conversation