Re: How to identify patients with 5 or more ER visits across a rolling...

SM1 · Posted 05-17-2016 08:48 PM

Hi,

I'm working with health care utilization data for 2011 to 2015. I need to identify all patients with records for 5 or more ER visits during any 12 month period of the study.

The table I'm using has 1 observation per visit (repeated measures data) so a patient could have multiple observations. I generated this table by selecting only those observations where SERVICE_TYPE = 'ER' and kept the following variables:

PATID - patient ID

APPT_DATE - date of medical event (SAS date)

SERVICE_TYPE - type of medical care (all values = 'ER'; variable retained to check)

PCMID - unique medical provider ID

The table is sorted by PATSSN and APPT_DATE.

Sample data:

PATID APPT_DATE SERVICE_TYPE PCMID

12345 11/19/2013 ER A1B2

12345 03/25/2014 ER S8U7

12345 05/15/2014 ER H5J8

12345 08/14/2014 ER A1B2

12345 11/06/2014 ER D4D5

23456 04/15/2011 ER H5J8

34567 02/11/2012 ER K9L6

34567 04/15/2012 ER B5D4

34567 08/26/2012 ER X4P8

34567 01/14/2013 ER J8D6

34567 02/27/2013 ER G7H5

From this data, PATID = 12345 is the only patient that should be identified as meeting the criteria of having 5 or more ER visits within any 12 month period.

I've been trying to modify PROC SQL syntax from a colleague - - but since her criteria was any 2 visits within a rolling 12 month period, I can't figure out how to make it work.

Any suggestions for the best way to approach this problem?

Thanks!

Ksharp · Posted 05-17-2016 10:59 PM

Assuming your dataset is not big. Pick up the obs whose COUNT is greater than 5 .

data have;
input PATID      APPT_DATE : mmddyy10.   SERVICE_TYPE $  PCMID $;
format APPT_DATE  mmddyy10.;
cards;
12345      11/19/2013         ER                      A1B2
12345      03/25/2014         ER                      S8U7
12345      05/15/2014         ER                      H5J8 
12345      08/14/2014         ER                      A1B2
12345      11/06/2014         ER                      D4D5
23456      04/15/2011         ER                      H5J8
34567      02/11/2012         ER                      K9L6
34567      04/15/2012         ER                      B5D4
34567      08/26/2012         ER                      X4P8
34567      01/14/2013         ER                      J8D6
34567      02/27/2013         ER                      G7H5
;
run;
proc sql;
create table want as
 select *,(select count(*) from have 
    where PATID=a.PATID and APPT_DATE between intnx('year',a.APPT_DATE,-1,'s') and a.APPT_DATE)
     as count
  from have as a;
quit;

DeepakSwain · Posted 05-18-2016 01:44 PM

Hi Ksharp,

I think addition of one more line to your fantastic code will give a better summarized picture:

proc sql;
create table want as
 select *,(select count(*) from have 
    where PATID=a.PATID and APPT_DATE between intnx('year',a.APPT_DATE,-1,'s') and a.APPT_DATE)
     as count
  from have as a
where calculated count ge 5;
quit;

Regards,

Swain

SM1 · Posted 07-05-2016 08:00 PM

Ksharp,

Thank you for the reply . . . and I apologize for the very long delay in responding.

I'm working with a really large dataset so running your suggested syntax does take a long time.

I do have a question.

I ran the following syntax on the dataset after I de-duplicated by PATID and APPT_DATE. Some patients had multiple ER visits on the same day, but I wanted only those patients with 5+ ER visits on unique days.

proc sql;
create table er_ge5 as
select *,(select count(*) from er_patients
where PATID=a.PATID and APPT_DATE between intnx('year',a.APPT_DATE,-1,'s') and a.APPT_DATE)
as count
from er_patients as a
where calculated count ge 5;
quit;

I then sorted the new table (er_ge5) by PATID and appt_date and confirmed that count = 5 for the 1st event for each patient.

I've provided output for 1 patient. Everything looks fine until the row with count '16' - there are multiple rows with identical counts, but these are records for the same patient.

PATID	APPT_DATE	COUNT
4444444444	12/20/2012	5
4444444444	12/26/2012	6
4444444444	12/27/2012	7
4444444444	1/6/2013	8
4444444444	1/8/2013	9
4444444444	1/12/2013	10
4444444444	1/20/2013	11
4444444444	2/10/2013	12
4444444444	2/23/2013	13
4444444444	3/11/2013	14
4444444444	5/18/2013	15
4444444444	5/30/2013	16
4444444444	8/21/2013	16
4444444444	11/6/2013	16
4444444444	11/17/2013	17
4444444444	12/23/2013	15
4444444444	12/28/2013	14
4444444444	12/30/2013	15
4444444444	1/15/2014	13
4444444444	1/16/2014	14
4444444444	1/17/2014	15
4444444444	1/21/2014	15
4444444444	2/5/2014	16
4444444444	2/7/2014	17
4444444444	2/8/2014	18
4444444444	3/2/2014	17
4444444444	3/5/2014	18
4444444444	3/9/2014	19
4444444444	3/10/2014	20
4444444444	3/23/2014	20
4444444444	4/1/2014	21
4444444444	4/3/2014	22
4444444444	4/16/2014	23
4444444444	5/6/2014	24
4444444444	5/17/2014	25
4444444444	5/22/2014	25
4444444444	6/5/2014	25
4444444444	6/7/2014	26

I'm not sure why the count variable seems to be having problems. Should I be concerned?

Thanks!

Ksharp · Posted 07-05-2016 09:54 PM

"

Everything looks fine until the row with count '16' - there are multiple rows with identical counts, but these are records for the same patient.

"

I don't know what you mean.

Where is wrong ? Better post some data to demonstrate it .

Ksharp · Posted 07-05-2016 10:22 PM

Do you want unique COUNT for each ID ? That is easy by PROC SORT + NODUPKEY.

If your table is big ,try ARRAY or HashTable:

data have;
input PATID      APPT_DATE : mmddyy10.   SERVICE_TYPE $  PCMID $;
format APPT_DATE  mmddyy10.;
cards;
12345      11/19/2013         ER                      A1B2
12345      03/25/2014         ER                      S8U7
12345      05/15/2014         ER                      H5J8 
12345      08/14/2014         ER                      A1B2
12345      11/06/2014         ER                      D4D5
23456      04/15/2011         ER                      H5J8
34567      02/11/2012         ER                      K9L6
34567      04/15/2012         ER                      B5D4
34567      08/26/2012         ER                      X4P8
34567      01/14/2013         ER                      J8D6
34567      02/27/2013         ER                      G7H5
;
run;

data want;
 array x{99999} _temporary_;
 call missing(of x{*});
 
do i=1 by 1 until(last.PATID);
 set have;
 by PATID;
 x{i}=APPT_DATE;
end;

do until(last.PATID);
 set have;
 by PATID;
 n=0;
 do i=intnx('year',APPT_DATE,-1,'s') to APPT_DATE;
  if i in x then n+1;
 end;
 if n ge 5 then output;
end;

drop i;
run;

PGStats · Posted 05-18-2016 12:13 AM

Fairly efficient way to do this with an array:

data have;
input PATID APPT_DATE : mmddyy10. SERVICE_TYPE $  PCMID $;
format APPT_DATE  mmddyy10.;
datalines;
12345  11/19/2013   ER    A1B2
12345  03/25/2014   ER    S8U7
12345  05/15/2014   ER    H5J8 
12345  08/14/2014   ER    A1B2
12345  11/06/2014   ER    D4D5
23456  04/15/2011   ER    H5J8
34567  02/11/2012   ER    K9L6
34567  04/15/2012   ER    B5D4
34567  08/26/2012   ER    X4P8
34567  01/14/2013   ER    J8D6
34567  02/27/2013   ER    G7H5
;

data want;
array d{0:4} _temporary_;
call missing(of d{*});
do i = 1 by 1 until(last.patid);
    set have; by patid;
    d{mod(i,5)} = appt_date;
    if i >=5 then
        /* Note d{mod(i+1,5)} is the oldest date in array d */
        if intck("YEAR", d{mod(i+1,5)}, appt_date, "CONTINUOUS") < 1 then output;
    end;
drop i;
run;

/* Optional : remove multiple entries for same patient */
proc sort data=want nodupkey; by patid; run;

PG

SM1 · Posted 07-05-2016 08:26 PM

PG Stats,

Thank you so much for the suggested syntax.

This syntax runs much faster on my dataset then K Sharp's PROC SQL syntax, but there seems to be a problem.

I deduplicated my data set by PATID and APPT_DATE (so that each ER visit has a unique APPT_DATE). I then ran your suggested syntax (although I dropped the line to delete 'i'):

data er_patients_count;
array d{0:4} _temporary_;
call missing(of d{*});
do i = 1 by 1 until(last.PATID);
    set er_patients; by PATID;
    d{mod(i,5)} = APPT_DATE;
    if i >=5 then
        /* Note d{mod(i+1,5)} is the oldest date in array d */
        if intck("YEAR", d{mod(i+1,5)}, APPT_DATE, "CONTINUOUS") < 1 then output;
    end;
run;

The biggest concern is that the variable 'i' is not equal to 5 for each patient's 1st appt_date.

I sorted the new table by PATID and APPT_DATE, and created a new variable FIRST_ALL = FIRST.PATID. The syntax and frequency output for my table (N=2,086) follow.

proc freq data = er_patients_count;
where first_all = 1;
tables i;
run;

SAS Output

i	Frequency	Percent	Cumulative	Cumulative
Frequency	Percent
5	1152	55.23	1152	55.23
6	324	15.53	1476	70.76
7	205	9.83	1681	80.58
8	150	7.19	1831	87.78
9	126	6.04	1957	93.82
10	65	3.12	2022	96.93
11	31	1.49	2053	98.42
12	15	0.72	2068	99.14
13	10	0.48	2078	99.62
14	5	0.24	2083	99.86
15	3	0.14	2086	100

Here's sample output for 4 different patients:

i	PATID	APPT_DATE
8	4444444444	5/10/2013
9	4444444444	5/26/2013
10	4444444444	6/27/2013
11	4444444444	11/6/2013
12	4444444444	1/21/2014
13	4444444444	2/23/2014
15	4444444444	9/16/2014
16	4444444444	10/10/2014
17	4444444444	11/25/2014
18	4444444444	12/13/2014
19	4444444444	12/21/2014
20	4444444444	12/27/2014
5	5555555555	7/1/2014
6	5555555555	9/4/2014
10	6666666666	11/12/2014
9	7777777777	4/7/2014
10	7777777777	7/1/2014
11	7777777777	7/6/2014
12	7777777777	8/10/2014
13	7777777777	9/6/2014
14	7777777777	9/15/2014
15	7777777777	10/9/2014
16	7777777777	10/30/2014
17	7777777777	11/7/2014
18	7777777777	11/24/2014
19	7777777777	4/17/2015
20	7777777777	7/31/2015

Patient 4444444444 - 1st event i = 8, no i= 14

Patient 5555555555 - looks perfect

Patient 6666666666 - only 1 event, i = 10

Patient 7777777777 - 1st event i = 9, but everything else looks perfect

I'm not sure why there's a problem with counting the events.

Any suggestions?

Thanks so much!

PGStats · Posted 07-05-2016 09:57 PM

The output from the data step is the last visit of a sequence of 5 visits that occured within one year. Every set, even if they overlap, is output that way. If a patient has 7 visits within a year, you will get visits 5 (representing the sequence 1-5), 6 (2-6) and 7 (3-7).

Please describe what you need as output,

when a sequence is longer than 5 visits,
when there are many disjoint sequences for a patient
when a long sequence (> 1 year) contains many overlapping sub-sequences (< 1 year)

PG

SM1 · Posted 07-06-2016 09:09 PM

PG Stats,

I'm trying to achieve 2 objectives:

1. Identify all patients with 5+ ER visits during any rolling 12-month period between 1 OCT 2013 and 31 OCT 2015.

To accomplish this objective, I need an output variable to indicate that a patient met this basic criteria and should therefore be included in the sample. I'm not concerned with the number of ER visits or anything else.

2. Once I've identified the patients in my sample, I will extract all of their ER records for a larger period of time (1 OCT 2010 to 1 JUN 2016). Some patients have a number of ER visits across multiple years that meet the criteria of 5+ ER visits in any 12-month rolling period when the anchor date can slide from visit 1 to visit 2 to visit 3 and so on.

For example:

Visit 1 (appt_date = 15 JAN 2012)

Visit 2 (appt_date = 1 MAR 2012)

Visit 3 (appt_date = 1 JUN 2012)

Visit 4 (appt_date = 1 AUG 2012)

Visit 5 (appt_date = 1 OCT 2012)

Visit 6 (appt_date = 1 JAN 2013)

Visit 7 (appt_date = 1 APR 2013)

Visit 8 (appt_date = 1 JUL 2013)

Visit 9 (appt_date = 1 SEP 2013)

Visit 10 (appt_date = 1 DEC 2013)

Visit 11 (appt_date = 1 MAR 2014)

Visit 12 (appt_date = 1 MAY 2014)

For this patient, I need to identify 2 dates:

1. The date of the 1st ER visit so I can create a new variable "START_DATE" = 15 JAN 2012

2. The date of the final ER visit so I can create a new variable "END_DATE" = 1 MAY 2014

These 2 new variables will be used to define the patient's period of high use of ER services.

To accomplish this objective, I need an output variable that would flag the 1st ER visit and the final ER visit for the series of qualifying ER visits. Or output the 1st and the final ER records to a new table.

PGStats · Posted 07-07-2016 01:47 PM

So, given that your "ER records for a larger period of time" are in dataset largerPeriodAppt, you would be looking for something like this:


data fiveER;
array d{0:4} _temporary_;
call missing(of d{*});
do i = 1 by 1 until(last.patid);
    set have; by patid;
    d{mod(i,5)} = appt_date;
    if i >=5 then
        /* Note d{mod(i+1,5)} is the oldest date in array d */
        first_of5_Appt_date = d{mod(i+1,5)};
        if intck("YEAR", first_of5_Appt_date, appt_date, "CONTINUOUS") < 1 then output;
    end;
format first_of5_Appt_date yymmdd10.;
drop i;
run;

proc sql;

create table fiveERcriteriaPatients as
select 
    PATID,
    min(APPT_DATE) as START_DATE format=yymmdd10.,
    max(APPT_DATE) as END_DATE format=yymmdd10. 
from largerPeriodAppt
where PATID in (select PATID from fiveER)
group by PATID;

quit;

(untested)

PG

SM1 · Posted 07-07-2016 03:50 PM

PG Stats,

I ran your syntax as follows:

data fiveER;
retain patid appt_date first_of5_Appt_date i;
array d{0:4} _temporary_;
call missing(of d{*});
do i = 1 by 1 until(last.patid);
    set ER_patients;
   by patid;
    d{mod(i,5)} = appt_date;
    if i >=5 then
        /* Note d{mod(i+1,5)} is the oldest date in array d */
        first_of5_Appt_date = d{mod(i+1,5)};
        if intck("YEAR", first_of5_Appt_date, appt_date, "CONTINUOUS") < 1 then output;
    end;
format first_of5_Appt_date yymmdd10.;
run;

It looks like the syntax worked perfectly for patient 1. For each subsequent patient, it seems like the syntax failed to recognize last.PATID, so the value of APPT_DATE from the prior patient was retained.

Here's the output for patients 1, 2, 3, and 4:

PATID	APPT_DATE	FIRST_OF5_APPT_DATE	i
1111111111	11/6/2013	.	1
1111111111	11/17/2013	.	2
1111111111	12/23/2013	.	3
1111111111	12/28/2013	.	4
1111111111	12/30/2013	11/6/2013	5
1111111111	1/15/2014	11/17/2013	6
1111111111	1/16/2014	12/23/2013	7
1111111111	2/5/2014	12/28/2013	8
1111111111	2/8/2014	12/30/2013	9
1111111111	3/2/2014	1/15/2014	10
1111111111	3/10/2014	1/16/2014	11
1111111111	4/16/2014	2/5/2014	12
1111111111	5/6/2014	2/8/2014	13
1111111111	12/9/2013	2/8/2014	1
1111111111	12/15/2013	2/8/2014	2
1111111111	12/26/2013	2/8/2014	3
1111111111	5/23/2014	2/8/2014	4
2222222222	11/6/2013	2/8/2014	1
2222222222	1/21/2014	2/8/2014	2
2222222222	2/23/2014	2/8/2014	3
2222222222	8/11/2014	2/8/2014	4
2222222222	9/16/2014	11/6/2013	5
2222222222	10/10/2014	1/21/2014	6
2222222222	11/25/2014	2/23/2014	7
2222222222	12/13/2014	8/11/2014	8
2222222222	12/27/2014	9/16/2014	9
3333333333	12/27/2013	9/16/2014	1
3333333333	8/28/2014	9/16/2014	2
3333333333	9/22/2014	9/16/2014	3
3333333333	11/16/2014	9/16/2014	4
4444444444	10/13/2013	9/16/2014	1
4444444444	11/11/2013	9/16/2014	2
4444444444	11/13/2013	9/16/2014	3
4444444444	4/6/2014	9/16/2014	4

PGStats · Posted 07-07-2016 04:09 PM

I can't do any testing without data.

PG

SM1 · Posted 07-07-2016 05:30 PM

I've attached an Excel file with 134 records for approximately 20 patients. The only variables included are APPT_DATE and PATID.

PGStats · Posted 07-07-2016 06:01 PM

Ah! The code was missing a do block. I also added a sorting step, just in case.

proc sort data=have; by patid appt_date; run;

data fiveER;
array d{0:4} _temporary_;
call missing(of d{*});
do i = 1 by 1 until(last.patid);
    set have; by patid;
    d{mod(i,5)} = appt_date;
    if i >=5 then do;
        /* Note d{mod(i+1,5)} is the oldest date in array d */
        first_of5_Appt_date = d{mod(i+1,5)};
        if intck("YEAR", first_of5_Appt_date, appt_date, "CONTINUOUS") < 1 then output;
        end;
    end;
format first_of5_Appt_date appt_date yymmdd10.;
drop i;
run;

PG

How to identify patients with 5 or more ER visits across a rolling 12 month period?

Re: How to identify patients with 5 or more ER visits across a rolling 12 month period?

Re: How to identify patients with 5 or more ER visits across a rolling 12 month period?

Re: How to identify patients with 5 or more ER visits across a rolling 12 month period?

Re: How to identify patients with 5 or more ER visits across a rolling 12 month period?

Re: How to identify patients with 5 or more ER visits across a rolling 12 month period?

Re: How to identify patients with 5 or more ER visits across a rolling 12 month period?

Re: How to identify patients with 5 or more ER visits across a rolling 12 month period?

Re: How to identify patients with 5 or more ER visits across a rolling 12 month period?

Re: How to identify patients with 5 or more ER visits across a rolling 12 month period?

Re: How to identify patients with 5 or more ER visits across a rolling 12 month period?

Re: How to identify patients with 5 or more ER visits across a rolling 12 month period?

Re: How to identify patients with 5 or more ER visits across a rolling 12 month period?

Re: How to identify patients with 5 or more ER visits across a rolling 12 month period?

Re: How to identify patients with 5 or more ER visits across a rolling 12 month period?

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away