Hi,
I have observations with the 'date_created' column having dates as in YYYY-MM-DD format.
I would like to count the number of observations that are created within past 5 years from the focal observation, by adding 'numof5years' column.
my data:
[data_created]
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
expected output:
[data_created] [numof5years]
2013-03-01 0
2014-07-30 1
2014-09-15 2
2015-02-22 3
2018-01-02 4
2018-03-02 4
2019-01-26 5
thanks in advance.
Like this?
data HAVE;
input DATE yymmdd10.;
cards;
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
run;
proc sql;
select a.DATE format=date9.
,count(b.DATE)
from HAVE a
left join
HAVE b
on 0 < yrdif(b.DATE,a.DATE) <= 5
group by a.DATE
order by a.DATE;
Like this?
data HAVE;
input DATE yymmdd10.;
cards;
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
run;
proc sql;
select a.DATE format=date9.
,count(b.DATE)
from HAVE a
left join
HAVE b
on 0 < yrdif(b.DATE,a.DATE) <= 5
group by a.DATE
order by a.DATE;
haven't used yrdiff in a while. @ChrisNZ is brilliant as usual
copied(plagiarized from his) for hash demo
data have;
input date :yymmdd10.;
format date yymmdd10.;
cards;
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
;
data want ;
if _n_=1 then do;
if 0 then set have(rename=(date=_date));
declare hash H (dataset:'have(rename=(date=_date))',ordered: "A") ;
h.definekey ("_date") ;
h.definedone () ;
declare hiter hh('h');
end;
set have;
years=0;
do while(hh.next()=0);
if 0 < yrdif(_date,DATE,'age') <= 5 then years=sum(years,1);
end;
drop _:;
run;
Hi Nov, brilliant as usual!
i have modified the range,
if 0 <= yrdif(_date,DATE,'age') < 5
because i wanted to count the dates within the same year.
but by doing this, this makes a tiny problem... that some are counted twice..
when comparing with the hashtable, what if I want to cancel out counts coming from myself?
data have;
input date :yymmdd10.;
format date yymmdd10.;
cards;
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
;
proc sql;
create table want as
select a.date,sum(0 < yrdif(b.date,a.DATE,'age') <= 5) as count
from have a, have b
where b.date<=a.date
group by a.date;
quit;
You can do it in a datastep like this:
data HAVE;
input DATE yymmdd10.;
format date yymmdd10.;
cards;
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
;run;
data want;
set have;
retain _P_ 1;
do _P_=_P_ to _N_-1;
set have(keep=date rename=(date=old_date)) point=_P_;
if old_date>=intnx('year',date,-5,'S') then leave;
end;
numof5years=_N_-_P_;
drop old_date;
run;
If you have a BY variable in the real data, just ad a couple of lines:
data want;
set have;
by key;
if first.key then
_P_=_N_;
retain _P_ 1;
do _P_=_P_ to _N_-1;
set have(keep=date rename=(date=old_date)) point=_P_;
if old_date>=intnx('year',date,-5,'S') then leave;
end;
numof5years=_N_-_P_;
drop old_date;
run;
data HAVE;
input DATE yymmdd10.;
format DATE yymmdd10.;
cards;
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
run;
proc sql;
select *,(select count(*) from have
where intnx('year',a.date,-5,'s') < date < a.date) as want
from have as a;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.