BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ohsg94
Fluorite | Level 6

Hello everyone, 

I'm trying to count a variable called "EI_D" that values 1 for last 5 years. 

Here is my dataset. 

Year Code EI_D

2010   A      1

2011   A      0

2012   A      1

2013   A      1

2014   A      0

2015   A      1

2016   A      0

2017   A      0

2018   A      1

2014   B     1

2015   B     1

2016   B     0

2017   B     1

2018   B     0

 

I want my dataset looking like below.

Year Code EI_D Count

2010   A      1       

2011   A      0       

2012   A      1       

2013   A      1       

2014   A      0       3

2015   A      1       3   (Since I want to count EI_D that is 1 for last 5 years, I do not include 2010)

2016   A      0       3

2017   A      0       2

2018   A      1       2

2014   B     1        

2015   B     1

2016   B     0

2017   B     1

2018   B     0       3

 

I tried several types of PROC SQL code but they didn't work for me. 

I truly appreciate everyone's help...

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
  input Year Code $ EI_D;
  datalines;
2010 A 1
2011 A 0
2012 A 1
2013 A 1
2014 A 0
2015 A 1
2016 A 0
2017 A 0
2018 A 1
2014 B 1
2015 B 1
2016 B 0
2017 B 1
2018 B 0
;

data want;
 if _n_=1 then do;
   if 0 then set have(rename=(EI_D=_EI_D));
   declare hash h(dataset:'have(rename=(EI_D=_EI_D))',hashexp:20);
   h.definekey('year','code');
   h.definedata('_EI_D');
   h.definedone();
 end;
set have;
 _n=0;count=0;
 do _year=year-4 to year;
  if h.find(key:_year,key:code)=0 then do;_n+1;count+_EI_D; end;
 end;
 if _n ne 5 then count=.;
 drop _n _year _EI_D;
run;

View solution in original post

8 REPLIES 8
Patrick
Opal | Level 21

If you search the SAS forums for rolling sum or rolling count you'll find quite a few similar Q&A.

Below one coding option using your sample data.

Depending on your actual data types and if year is just a number or a SAS date value some amendments to the posted code will be required.

data have;
  input Year Code $ EI_D;
  datalines;
2010 A 1
2011 A 0
2012 A 1
2013 A 1
2014 A 0
2015 A 1
2016 A 0
2017 A 0
2018 A 1
2014 B 1
2015 B 1
2016 B 0
2017 B 1
2018 B 0
;

proc sql;
  select distinct
    t1.*,
    case
      when count(*) >= 5 then sum(t2.EI_D)
      else .
      end as sum_ei_d
  from 
    have t1
    left join
    have t2
    on t1.code=t2.code and t2.year between t1.year-4 and t1.year
  group by 
    t1.code, t1.year
  order by
    t1.code, t1.year
  ;
quit;
PaigeMiller
Diamond | Level 26

If you have PROC EXPAND, then this is relatively easy, its called a moving sum of the 5 observations. You can use the MOVSUM 5 operator. An example (for moving average) can be found here: https://documentation.sas.com/doc/en/etsug/15.2/etsug_expand_details19.htm#etsug_expand002903 (scroll down)

--
Paige Miller
Kurt_Bremser
Super User

Rolling sum in a data step:

data want;
set have;
by code year; /* year to assure correct sort order */
array y {0:4} _temporary_;
if first.code
then do;
  do i = 0 to 4; /* can be omitted, because we only calculate sums after the first 5 years */
    y{i} = 0;
  end;
  i = 1;
end;
else i + 1;
y{mod(_n_,5)} = ei_d;
if i ge 5 then count = sum(of y{*});
drop i;
run;

Untested, posted from my tablet.

ohsg94
Fluorite | Level 6
This worked for me. Thanks for your help.
Ksharp
Super User
data have;
  input Year Code $ EI_D;
  datalines;
2010 A 1
2011 A 0
2012 A 1
2013 A 1
2014 A 0
2015 A 1
2016 A 0
2017 A 0
2018 A 1
2014 B 1
2015 B 1
2016 B 0
2017 B 1
2018 B 0
;

data want;
 if _n_=1 then do;
   if 0 then set have(rename=(EI_D=_EI_D));
   declare hash h(dataset:'have(rename=(EI_D=_EI_D))',hashexp:20);
   h.definekey('year','code');
   h.definedata('_EI_D');
   h.definedone();
 end;
set have;
 _n=0;count=0;
 do _year=year-4 to year;
  if h.find(key:_year,key:code)=0 then do;_n+1;count+_EI_D; end;
 end;
 if _n ne 5 then count=.;
 drop _n _year _EI_D;
run;
ohsg94
Fluorite | Level 6
I tried it and it worked for me. Thanks a lot.
mkeintz
PROC Star

You want rolling sums of EI_D for "windows" of size 5.  If you have exactly one record per year, and no holes, then:LAG is efficient:

 

data want;
  set have;
  count=sum(lag4(ei_d),lag3(ei_d),lag2(ei_d),lag1(ei_d),ei_d);
  if lag4(code)^=code then count=.;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 738 views
  • 9 likes
  • 7 in conversation