Hi, I think this works OK. Maybe you can adapt it to your data. It joins the data with itself and looks for dates with an ID that are with one year. I also did a hash solution and got the same results. Of course it is possible that I'm really zero-for-two, but I think it's OK.
Since original post ...
#1 fixed the INTNX function, should be 12 months not 1 year (forgot about that interval b oundary stuff)
#2 fixed the duplicate output (e.g. nobs 7 matched with nobs 10 duplicated with nobs 10 matched with nobs 7) using the variable MINMAX
* make some data, IDs 1 through 50, 1 to 5 observations per ID, random dates;
data x; do id=1 to 50; do j=1 to ceil(5*ranuni(99)); date = '01jan2005'd + ceil(2000*ranuni(99)); nobs+1; output; end; end; keep id nobs date; format date mmddyy10.; run;
* join the data set with itself and look for dates within ID within one year;
proc sql; create table join (drop=lo hi) as select a.id, a.date as date_check, a.nobs as nobs_check, b.date, b.nobs, intnx('month',a.date, -12) as lo, intnx('month',a.date,12) as hi,
catx(' ',min(a.nobs,b.nobs),max(a.nobs,b.nobs)) as minmax from x as a, x as b where a.id = b.id and b.date between calculated lo and calculated hi and a.nobs ^= b.nobs; quit;
proc sort data=y out=y (drop=minmax) nodupkey;
by id minmax;
run;
Here's data set x, IDs 1 through 3 ...
Obs id date nobs
1 1 03/20/2005 1 2 1 03/12/2008 2 3 2 04/24/2006 3 4 2 04/02/2010 4 5 2 02/19/2009 5 6 2 06/29/2008 6 7 3 03/27/2008 7 8 3 08/15/2005 8 9 3 09/24/2009 9 10 3 09/01/2007 10 11 3 07/27/2008 11
Here's data set JOIN for the IDs ...
id=2
nobs_ Obs check date_check date nobs
1 5 02/19/2009 06/29/2008 6
id=3
nobs_ Obs check date_check date nobs
2 10 09/01/2007 07/27/2008 11 3 7 03/27/2008 09/01/2007 10 4 7 03/27/2008 07/27/2008 11
FYI ... the hash version with help from Better Hashing in SAS® 9.2 ...
https://support.sas.com/resources/papers/sgf2008/better-hashing-sas92.pdf
and the HTML screws up a statement, the &colon you see below should be a real colon as in :
data y (keep=id nobs date_check nobs_check date minmax); retain id date nobs 0; dcl hash h(dataset : 'x', multidata :'y'); h.definekey('id'); h.definedata('date', 'nobs'); h.definedone(); do until(last); set x end=last; date_check = date; nobs_check = nobs; lo = intnx('month', date_check, -12); hi = intnx('month', date_check, 12); h.find(); minmax = catx(' ',min(nobs,nobs_check),max(nobs,nobs_check)); if date ge lo and date le hi and nobs ne nobs_check then output; h.has_next(result: r); do while(r ne 0); h.find_next(); minmax = catx(' ',min(nobs,nobs_check),max(nobs,nobs_check)); if date ge lo and date le hi and nobs ne nobs_check then output; h.has_next(result: r); end; end; stop; format date_check mmddyy10.; run;
proc sort data=y out=y (drop=minmax) nodupkey; by id minmax; run;
... View more