Hello everyone,
I'm trying to count a variable called "EI_D" that values 1 for last 5 years.
Here is my dataset.
Year Code EI_D
2010 A 1
2011 A 0
2012 A 1
2013 A 1
2014 A 0
2015 A 1
2016 A 0
2017 A 0
2018 A 1
2014 B 1
2015 B 1
2016 B 0
2017 B 1
2018 B 0
I want my dataset looking like below.
Year Code EI_D Count
2010 A 1
2011 A 0
2012 A 1
2013 A 1
2014 A 0 3
2015 A 1 3 (Since I want to count EI_D that is 1 for last 5 years, I do not include 2010)
2016 A 0 3
2017 A 0 2
2018 A 1 2
2014 B 1
2015 B 1
2016 B 0
2017 B 1
2018 B 0 3
I tried several types of PROC SQL code but they didn't work for me.
I truly appreciate everyone's help...
data have;
input Year Code $ EI_D;
datalines;
2010 A 1
2011 A 0
2012 A 1
2013 A 1
2014 A 0
2015 A 1
2016 A 0
2017 A 0
2018 A 1
2014 B 1
2015 B 1
2016 B 0
2017 B 1
2018 B 0
;
data want;
if _n_=1 then do;
if 0 then set have(rename=(EI_D=_EI_D));
declare hash h(dataset:'have(rename=(EI_D=_EI_D))',hashexp:20);
h.definekey('year','code');
h.definedata('_EI_D');
h.definedone();
end;
set have;
_n=0;count=0;
do _year=year-4 to year;
if h.find(key:_year,key:code)=0 then do;_n+1;count+_EI_D; end;
end;
if _n ne 5 then count=.;
drop _n _year _EI_D;
run;
If you search the SAS forums for rolling sum or rolling count you'll find quite a few similar Q&A.
Below one coding option using your sample data.
Depending on your actual data types and if year is just a number or a SAS date value some amendments to the posted code will be required.
data have;
input Year Code $ EI_D;
datalines;
2010 A 1
2011 A 0
2012 A 1
2013 A 1
2014 A 0
2015 A 1
2016 A 0
2017 A 0
2018 A 1
2014 B 1
2015 B 1
2016 B 0
2017 B 1
2018 B 0
;
proc sql;
select distinct
t1.*,
case
when count(*) >= 5 then sum(t2.EI_D)
else .
end as sum_ei_d
from
have t1
left join
have t2
on t1.code=t2.code and t2.year between t1.year-4 and t1.year
group by
t1.code, t1.year
order by
t1.code, t1.year
;
quit;
If you have PROC EXPAND, then this is relatively easy, its called a moving sum of the 5 observations. You can use the MOVSUM 5 operator. An example (for moving average) can be found here: https://documentation.sas.com/doc/en/etsug/15.2/etsug_expand_details19.htm#etsug_expand002903 (scroll down)
Rolling sum in a data step:
data want;
set have;
by code year; /* year to assure correct sort order */
array y {0:4} _temporary_;
if first.code
then do;
do i = 0 to 4; /* can be omitted, because we only calculate sums after the first 5 years */
y{i} = 0;
end;
i = 1;
end;
else i + 1;
y{mod(_n_,5)} = ei_d;
if i ge 5 then count = sum(of y{*});
drop i;
run;
Untested, posted from my tablet.
data have;
input Year Code $ EI_D;
datalines;
2010 A 1
2011 A 0
2012 A 1
2013 A 1
2014 A 0
2015 A 1
2016 A 0
2017 A 0
2018 A 1
2014 B 1
2015 B 1
2016 B 0
2017 B 1
2018 B 0
;
data want;
if _n_=1 then do;
if 0 then set have(rename=(EI_D=_EI_D));
declare hash h(dataset:'have(rename=(EI_D=_EI_D))',hashexp:20);
h.definekey('year','code');
h.definedata('_EI_D');
h.definedone();
end;
set have;
_n=0;count=0;
do _year=year-4 to year;
if h.find(key:_year,key:code)=0 then do;_n+1;count+_EI_D; end;
end;
if _n ne 5 then count=.;
drop _n _year _EI_D;
run;
@Ksharp Nice!
You want rolling sums of EI_D for "windows" of size 5. If you have exactly one record per year, and no holes, then:LAG is efficient:
data want;
set have;
count=sum(lag4(ei_d),lag3(ei_d),lag2(ei_d),lag1(ei_d),ei_d);
if lag4(code)^=code then count=.;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.