Solved: identifying observations 12 month period

jenim514 · Posted 10-27-2015 02:09 PM

I'm working on a health utilization project. I need to figure out how to identify patients using a specific algorithm for:

Patients with a specific (diagnosis) DX that has 2 outpatient vistis or more in any rolling 12 month period. I can do it on a set period (e.g. 2014) but my study is over a 4 year period, so it should be any rolling 12 month period over the 4 years.

MikeZdeb · Posted 10-27-2015 09:50 PM

Hi. In your post you mentioned ...

" ... admitted once in Jan2012 and once in Mar2013...I'm do not want to count. If Pat_ID #2 had Dx1 or Dx2 admitted once in Dec 2013 and again in Jan2014 ... "

but there was no such data in the posted data. I added data to match that in your description and used the SQL method I posted earlier and the results seem to match your request...

* your data with my additions for ID 1 and ID 2;

data x;
input id (dx1 dx2) (:$5.) date :date9.;
format date mmddyy10.;
datalines;
1 49390 5409 01JAN2012
2 2512 49392 19DEC2013
3 49390 72888 03JUN2014
3 49390 3094 26OCT2013
4 2752 49392 06OCT2013
5 49381 72252 26NOV2013
6 49320 7220 28APR2014
7 49390 55329 19DEC2013
8 49390 48882 08JUL2014
9 490   49392 08DEC2013
1 999   99999 07MAR2013
2 888   88888 01JAN2014

;

* add observation number;

data x;
set x;
nobs = _n_;
run;

proc sql;
create table y (drop=lo hi) as
select a.id, a.date as date_check, a.nobs as nobs_check, a.dx1 as dx1_check, a.dx2 as dx2_check,
       b.date, b.nobs, b.dx1, b.dx2,
       intnx('month',a.date, -12) as lo, intnx('month',a.date,12) as hi,
       catx(' ',min(a.date,b.date),max(a.date,b.date)) as minmax
from x as a, x as b
where a.id = b.id and b.date between calculated lo and calculated hi and a.nobs ^= b.nobs

order id, date_check;
quit;

* get rid of duplicates;

proc sort data=y out=y (drop=minmax) nodupkey;
by id minmax;
run; hi mmddyy10.;
run;

RESULTS ...

id=2

nobs_ dx1_ dx2_
Obs check date_check check check nobs date dx1 dx2

1 2 12/19/2013 2512 49392 12 01/01/2014 888 88888

id=3

nobs_ dx1_ dx2_
Obs check date_check check check nobs date dx1 dx2

2 4 10/26/2013 49390 3094 3 06/03/2014 49390 72888

View solution in original post

Steelers_In_DC · Posted 10-27-2015 03:04 PM

If you have example data that will be helpful.

jenim514 · Posted 10-27-2015 05:23 PM

This is sample data. The years of the whole data set is 2010-2014. So I need to identify sepcific Pat_ID with 2 or more ADMDATEs in a continuous 12 month period of time. For exampme if Pat_ID #1 had a DX1 or DX2 admitted once in Jan2012 and once in Mar2013...I'm do not want to count. If Pat_ID #2 had Dx1 or Dx2 admitted once in Dec 2013 and again in Jan2014, I do want to count. Thanks!

PAT_ID DX2 DX1 ADMDATE

1	49390 0	5409	05APR2014
2	2512	49392 0	19DEC2013
3	49390 0	72888	03JUN2014
3	49390 1	3094	26OCT2013
4	2752	49392 0	06OCT2013
5	49381 0	72252	26NOV2013
6	49320 0	7220	28APR2014
7	49390 0	55329	19DEC2013
8	49390 0	48882	08JUL2014
9	490	49392 0	08DEC2013

MikeZdeb · Posted 10-27-2015 09:50 PM

Hi. In your post you mentioned ...

" ... admitted once in Jan2012 and once in Mar2013...I'm do not want to count. If Pat_ID #2 had Dx1 or Dx2 admitted once in Dec 2013 and again in Jan2014 ... "

but there was no such data in the posted data. I added data to match that in your description and used the SQL method I posted earlier and the results seem to match your request...

* your data with my additions for ID 1 and ID 2;

data x;
input id (dx1 dx2) (:$5.) date :date9.;
format date mmddyy10.;
datalines;
1 49390 5409 01JAN2012
2 2512 49392 19DEC2013
3 49390 72888 03JUN2014
3 49390 3094 26OCT2013
4 2752 49392 06OCT2013
5 49381 72252 26NOV2013
6 49320 7220 28APR2014
7 49390 55329 19DEC2013
8 49390 48882 08JUL2014
9 490   49392 08DEC2013
1 999   99999 07MAR2013
2 888   88888 01JAN2014

;

* add observation number;

data x;
set x;
nobs = _n_;
run;

proc sql;
create table y (drop=lo hi) as
select a.id, a.date as date_check, a.nobs as nobs_check, a.dx1 as dx1_check, a.dx2 as dx2_check,
       b.date, b.nobs, b.dx1, b.dx2,
       intnx('month',a.date, -12) as lo, intnx('month',a.date,12) as hi,
       catx(' ',min(a.date,b.date),max(a.date,b.date)) as minmax
from x as a, x as b
where a.id = b.id and b.date between calculated lo and calculated hi and a.nobs ^= b.nobs

order id, date_check;
quit;

* get rid of duplicates;

proc sort data=y out=y (drop=minmax) nodupkey;
by id minmax;
run; hi mmddyy10.;
run;

RESULTS ...

id=2

nobs_ dx1_ dx2_
Obs check date_check check check nobs date dx1 dx2

1 2 12/19/2013 2512 49392 12 01/01/2014 888 88888

id=3

nobs_ dx1_ dx2_
Obs check date_check check check nobs date dx1 dx2

2 4 10/26/2013 49390 3094 3 06/03/2014 49390 72888

jenim514 · Posted 10-28-2015 10:41 AM

Thank you!!

jenim514 · Posted 05-31-2016 04:20 PM

Hi @MikeZdeb. A while ago you gave the the solution to a question I have about identifying observations in a rolling 12 month period. Can you help me modify this syntax to identify encounters that have 5 ER visits in any rolling 12 month period (regardless of diagnosis 'dx'). I feel like if should be similar to this syntax, but not quite sure how I should work this. All the observations in the table are ER visits, but I just need those that have 5 or more in any rolling 12 month period (table contains 5 years of ER data).

Thank you!!

MikeZdeb · Posted 10-27-2015 04:55 PM

Hi, I think this works OK. Maybe you can adapt it to your data. It joins the data with itself and looks for dates with an ID that are with one year. I also did a hash solution and got the same results. Of course it is possible that I'm really zero-for-two, but I think it's OK.

Since original post ...

#1 fixed the INTNX function, should be 12 months not 1 year (forgot about that interval b oundary stuff)

#2 fixed the duplicate output (e.g. nobs 7 matched with nobs 10 duplicated with nobs 10 matched with nobs 7) using the variable MINMAX

* make some data, IDs 1 through 50, 1 to 5 observations per ID, random dates;

data x;
do id=1 to 50;
do j=1 to ceil(5*ranuni(99));
   date = '01jan2005'd + ceil(2000*ranuni(99));
   nobs+1;
   output;
end;
end;
keep id nobs date;
format date mmddyy10.;
run;

* join the data set with itself and look for dates within ID within one year;

proc sql;
create table join (drop=lo hi) as
select a.id, a.date as date_check, a.nobs as nobs_check, b.date, b.nobs,
intnx('month',a.date, -12) as lo, intnx('month',a.date,12) as hi,

catx(' ',min(a.nobs,b.nobs),max(a.nobs,b.nobs)) as minmax
from x as a, x as b
where a.id = b.id and b.date between calculated lo and calculated hi and a.nobs ^= b.nobs;
quit;

proc sort data=y out=y (drop=minmax) nodupkey;

by id minmax;

run;

Here's data set x, IDs 1 through 3 ...

Obs id date nobs

1     1    03/20/2005      1
2     1    03/12/2008      2
3     2    04/24/2006      3
4     2    04/02/2010      4
5     2    02/19/2009      5
6     2    06/29/2008      6
7     3    03/27/2008      7
8     3    08/15/2005      8
9     3    09/24/2009      9
10     3    09/01/2007     10
11     3    07/27/2008     11

Here's data set JOIN for the IDs ...

id=2

nobs_
Obs check date_check date nobs

1 5 02/19/2009 06/29/2008 6

id=3

nobs_
Obs check date_check date nobs

2      10     09/01/2007    07/27/2008     11
3       7     03/27/2008    09/01/2007     10
4       7     03/27/2008    07/27/2008     11

FYI ... the hash version with help from Better Hashing in SAS® 9.2 ...

https://support.sas.com/resources/papers/sgf2008/better-hashing-sas92.pdf

and the HTML screws up a statement, the &colon you see below should be a real colon as in :

data y (keep=id nobs date_check nobs_check date minmax);
retain id date nobs 0;
dcl hash h(dataset : 'x', multidata &colon;'y');
h.definekey('id');
h.definedata('date', 'nobs');
h.definedone();
do until(last);
set x end=last;
date_check = date;
nobs_check = nobs;
lo = intnx('month', date_check, -12);
hi = intnx('month', date_check, 12);
h.find();
minmax = catx(' ',min(nobs,nobs_check),max(nobs,nobs_check));
if date ge lo and date le hi and nobs ne nobs_check then output;
h.has_next(result: r);
do while(r ne 0);
   h.find_next();
   minmax = catx(' ',min(nobs,nobs_check),max(nobs,nobs_check));
   if date ge lo and date le hi and nobs ne nobs_check then output;
   h.has_next(result: r);
end;
end;
stop;
format date_check mmddyy10.;
run;

proc sort data=y out=y (drop=minmax) nodupkey;
by id minmax;
run;

identifying observations 12 month period

Re: identifying observations 12 month period

Re: identifying observations 12 month period

Re: identifying observations 12 month period

Re: identifying observations 12 month period

Re: identifying observations 12 month period

Re: identifying observations 12 month period

Re: identifying observations 12 month period

SAS Innovate 2025: Call for Content

Classroom Training Available!