BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jimmychoi
Obsidian | Level 7

Hi,

 

I have observations with the 'date_created' column having dates as in YYYY-MM-DD format.

I would like to count the number of observations that are created within past 5 years from the focal observation, by adding 'numof5years' column.

 

my data:

[data_created]

2013-03-01

2014-07-30

2014-09-15

2015-02-22

2018-01-02

2018-03-02

2019-01-26

 

expected output:

[data_created]        [numof5years]

2013-03-01                     0

2014-07-30                     1

2014-09-15                     2

2015-02-22                     3

2018-01-02                     4

2018-03-02                     4

2019-01-26                     5

 

thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Like this?

data HAVE;
 input DATE yymmdd10.;
 cards;
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
run;
proc sql;
  select a.DATE     format=date9.
        ,count(b.DATE)        
  from HAVE a
         left join
       HAVE b
        on 0 < yrdif(b.DATE,a.DATE) <= 5
  group by a.DATE
  order by a.DATE;
DATE  
01MAR2013 0
30JUL2014 1
15SEP2014 2
22FEB2015 3
02JAN2018 4
02MAR2018 4
26JAN2019 5

 

View solution in original post

11 REPLIES 11
ChrisNZ
Tourmaline | Level 20

Like this?

data HAVE;
 input DATE yymmdd10.;
 cards;
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
run;
proc sql;
  select a.DATE     format=date9.
        ,count(b.DATE)        
  from HAVE a
         left join
       HAVE b
        on 0 < yrdif(b.DATE,a.DATE) <= 5
  group by a.DATE
  order by a.DATE;
DATE  
01MAR2013 0
30JUL2014 1
15SEP2014 2
22FEB2015 3
02JAN2018 4
02MAR2018 4
26JAN2019 5

 

jimmychoi
Obsidian | Level 7
Hi Chris, actually i'm doing this in a DATA procedure.
will there be a comparable function as 'yrdif' of SQL?
Reeza
Super User
For a within table lookup like this, it's better to stick with SQL, it's much harder to do this type of join in a data step.
ChrisNZ
Tourmaline | Level 20

actually i'm doing this in a DATA procedure. will there be a comparable function as 'yrdif' of SQL?
 
You can use that function in data step as well. Cartesian products are not easily obtained in data steps though.
The type of join guides the coding choice here.
novinosrin
Tourmaline | Level 20

haven't used yrdiff in a while. @ChrisNZ  is brilliant as usual

 

copied(plagiarized from his) for hash demo

 


data have;
input date :yymmdd10.;
format date yymmdd10.;
cards;
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
;

data want ;
 if _n_=1 then do;
 if 0 then set have(rename=(date=_date));
   declare hash H (dataset:'have(rename=(date=_date))',ordered: "A") ;
   h.definekey  ("_date") ;
   h.definedone () ;
   declare hiter hh('h');
   end;
set have;
years=0;
do while(hh.next()=0);
if 	0 < yrdif(_date,DATE,'age') <= 5 then years=sum(years,1);
end;
drop _:;
run;

 

jimmychoi
Obsidian | Level 7

Hi Nov, brilliant as usual!

 

i have modified the range, 

if 	0 <= yrdif(_date,DATE,'age') < 5

because i wanted to count the dates within the same year.

but by doing this, this makes a tiny problem... that some are counted twice..

when comparing with the hashtable, what if I want to cancel out counts coming from myself?

novinosrin
Tourmaline | Level 20
data have;
input date :yymmdd10.;
format date yymmdd10.;
cards;
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
;

proc sql;
create table want as
select a.date,sum(0 < yrdif(b.date,a.DATE,'age') <= 5)  as count
from have a, have b
where b.date<=a.date
group by a.date;
quit;
s_lassen
Meteorite | Level 14

You can do it in a datastep like this:

data HAVE;
 input DATE yymmdd10.;
 format date yymmdd10.;
 cards;
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
;run;

data want;
  set have;
  retain _P_ 1;
  do _P_=_P_ to _N_-1;
    set have(keep=date rename=(date=old_date)) point=_P_;
    if old_date>=intnx('year',date,-5,'S') then leave;
    end;
  numof5years=_N_-_P_;
  drop old_date;
run;

If you have a BY variable in the real data, just ad a couple of lines:

data want;
  set have;
  by key;
  if first.key then
    _P_=_N_;
  retain _P_ 1;
  do _P_=_P_ to _N_-1;
    set have(keep=date rename=(date=old_date)) point=_P_;
    if old_date>=intnx('year',date,-5,'S') then leave;
    end;
  numof5years=_N_-_P_;
  drop old_date;
run;
Ksharp
Super User
data HAVE;
 input DATE yymmdd10.;
 format DATE yymmdd10.;
 cards;
2013-03-01
2014-07-30
2014-09-15
2015-02-22
2018-01-02
2018-03-02
2019-01-26
run;
proc sql;
select *,(select count(*) from have 
where  intnx('year',a.date,-5,'s') < date < a.date) as want
 from have as a;
quit;
jimmychoi
Obsidian | Level 7
wow adding a calculated row using sql... thnx a lot K

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1211 views
  • 5 likes
  • 6 in conversation