Hello everyone,
I'm trying to count a variable called "EI_D" that values 1 for last 5 years.
Here is my dataset.
Year Code EI_D
2010 A 1
2011 A 0
2012 A 1
2013 A 1
2014 A 0
2015 A 1
2016 A 0
2017 A 0
2018 A 1
2014 B 1
2015 B 1
2016 B 0
2017 B 1
2018 B 0
I want my dataset looking like below.
Year Code EI_D Count
2010 A 1
2011 A 0
2012 A 1
2013 A 1
2014 A 0 3
2015 A 1 3 (Since I want to count EI_D that is 1 for last 5 years, I do not include 2010)
2016 A 0 3
2017 A 0 2
2018 A 1 2
2014 B 1
2015 B 1
2016 B 0
2017 B 1
2018 B 0 3
I tried several types of PROC SQL code but they didn't work for me.
I truly appreciate everyone's help...
data have;
input Year Code $ EI_D;
datalines;
2010 A 1
2011 A 0
2012 A 1
2013 A 1
2014 A 0
2015 A 1
2016 A 0
2017 A 0
2018 A 1
2014 B 1
2015 B 1
2016 B 0
2017 B 1
2018 B 0
;
data want;
if _n_=1 then do;
if 0 then set have(rename=(EI_D=_EI_D));
declare hash h(dataset:'have(rename=(EI_D=_EI_D))',hashexp:20);
h.definekey('year','code');
h.definedata('_EI_D');
h.definedone();
end;
set have;
_n=0;count=0;
do _year=year-4 to year;
if h.find(key:_year,key:code)=0 then do;_n+1;count+_EI_D; end;
end;
if _n ne 5 then count=.;
drop _n _year _EI_D;
run;
If you search the SAS forums for rolling sum or rolling count you'll find quite a few similar Q&A.
Below one coding option using your sample data.
Depending on your actual data types and if year is just a number or a SAS date value some amendments to the posted code will be required.
data have;
input Year Code $ EI_D;
datalines;
2010 A 1
2011 A 0
2012 A 1
2013 A 1
2014 A 0
2015 A 1
2016 A 0
2017 A 0
2018 A 1
2014 B 1
2015 B 1
2016 B 0
2017 B 1
2018 B 0
;
proc sql;
select distinct
t1.*,
case
when count(*) >= 5 then sum(t2.EI_D)
else .
end as sum_ei_d
from
have t1
left join
have t2
on t1.code=t2.code and t2.year between t1.year-4 and t1.year
group by
t1.code, t1.year
order by
t1.code, t1.year
;
quit;
If you have PROC EXPAND, then this is relatively easy, its called a moving sum of the 5 observations. You can use the MOVSUM 5 operator. An example (for moving average) can be found here: https://documentation.sas.com/doc/en/etsug/15.2/etsug_expand_details19.htm#etsug_expand002903 (scroll down)
Rolling sum in a data step:
data want;
set have;
by code year; /* year to assure correct sort order */
array y {0:4} _temporary_;
if first.code
then do;
do i = 0 to 4; /* can be omitted, because we only calculate sums after the first 5 years */
y{i} = 0;
end;
i = 1;
end;
else i + 1;
y{mod(_n_,5)} = ei_d;
if i ge 5 then count = sum(of y{*});
drop i;
run;
Untested, posted from my tablet.
data have;
input Year Code $ EI_D;
datalines;
2010 A 1
2011 A 0
2012 A 1
2013 A 1
2014 A 0
2015 A 1
2016 A 0
2017 A 0
2018 A 1
2014 B 1
2015 B 1
2016 B 0
2017 B 1
2018 B 0
;
data want;
if _n_=1 then do;
if 0 then set have(rename=(EI_D=_EI_D));
declare hash h(dataset:'have(rename=(EI_D=_EI_D))',hashexp:20);
h.definekey('year','code');
h.definedata('_EI_D');
h.definedone();
end;
set have;
_n=0;count=0;
do _year=year-4 to year;
if h.find(key:_year,key:code)=0 then do;_n+1;count+_EI_D; end;
end;
if _n ne 5 then count=.;
drop _n _year _EI_D;
run;
@Ksharp Nice!
You want rolling sums of EI_D for "windows" of size 5. If you have exactly one record per year, and no holes, then:LAG is efficient:
data want;
set have;
count=sum(lag4(ei_d),lag3(ei_d),lag2(ei_d),lag1(ei_d),ei_d);
if lag4(code)^=code then count=.;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.