Hi,
I have observations with the 'date_created' column having dates as in YYYY-MM-DD format.
I would like to count the number of observations that are created within past 5 years from the focal observation, by adding 'numof5years' column.
my data:
[data_created]
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
expected output:
[data_created] [numof5years]
2013-03-01 0
2014-07-30 1
2014-09-15 2
2015-02-22 3
2018-01-02 4
2018-03-02 4
2019-01-26 5
thanks in advance.
Like this?
data HAVE;
input DATE yymmdd10.;
cards;
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
run;
proc sql;
select a.DATE format=date9.
,count(b.DATE)
from HAVE a
left join
HAVE b
on 0 < yrdif(b.DATE,a.DATE) <= 5
group by a.DATE
order by a.DATE;
Like this?
data HAVE;
input DATE yymmdd10.;
cards;
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
run;
proc sql;
select a.DATE format=date9.
,count(b.DATE)
from HAVE a
left join
HAVE b
on 0 < yrdif(b.DATE,a.DATE) <= 5
group by a.DATE
order by a.DATE;
haven't used yrdiff in a while. @ChrisNZ is brilliant as usual
copied(plagiarized from his) for hash demo
data have;
input date :yymmdd10.;
format date yymmdd10.;
cards;
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
;
data want ;
if _n_=1 then do;
if 0 then set have(rename=(date=_date));
declare hash H (dataset:'have(rename=(date=_date))',ordered: "A") ;
h.definekey ("_date") ;
h.definedone () ;
declare hiter hh('h');
end;
set have;
years=0;
do while(hh.next()=0);
if 0 < yrdif(_date,DATE,'age') <= 5 then years=sum(years,1);
end;
drop _:;
run;
Hi Nov, brilliant as usual!
i have modified the range,
if 0 <= yrdif(_date,DATE,'age') < 5
because i wanted to count the dates within the same year.
but by doing this, this makes a tiny problem... that some are counted twice..
when comparing with the hashtable, what if I want to cancel out counts coming from myself?
data have;
input date :yymmdd10.;
format date yymmdd10.;
cards;
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
;
proc sql;
create table want as
select a.date,sum(0 < yrdif(b.date,a.DATE,'age') <= 5) as count
from have a, have b
where b.date<=a.date
group by a.date;
quit;
You can do it in a datastep like this:
data HAVE;
input DATE yymmdd10.;
format date yymmdd10.;
cards;
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
;run;
data want;
set have;
retain _P_ 1;
do _P_=_P_ to _N_-1;
set have(keep=date rename=(date=old_date)) point=_P_;
if old_date>=intnx('year',date,-5,'S') then leave;
end;
numof5years=_N_-_P_;
drop old_date;
run;
If you have a BY variable in the real data, just ad a couple of lines:
data want;
set have;
by key;
if first.key then
_P_=_N_;
retain _P_ 1;
do _P_=_P_ to _N_-1;
set have(keep=date rename=(date=old_date)) point=_P_;
if old_date>=intnx('year',date,-5,'S') then leave;
end;
numof5years=_N_-_P_;
drop old_date;
run;
data HAVE;
input DATE yymmdd10.;
format DATE yymmdd10.;
cards;
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
run;
proc sql;
select *,(select count(*) from have
where intnx('year',a.date,-5,'s') < date < a.date) as want
from have as a;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.