BookmarkSubscribeRSS Feed
SM1
Obsidian | Level 7 SM1
Obsidian | Level 7

Hi,

 

I'm working with health care utilization data for 2011 to 2015. I need to identify all patients with records for 5 or more ER visits during any 12 month period of the study.

 

The table I'm using has 1 observation per visit (repeated measures data) so a patient could have multiple observations. I generated this table by selecting only those observations where SERVICE_TYPE = 'ER' and kept the following variables:

 

PATID - patient ID

APPT_DATE - date of medical event (SAS date)

SERVICE_TYPE - type of medical care (all values = 'ER'; variable retained to check)

PCMID - unique medical provider ID

 

The table is sorted by PATSSN and APPT_DATE.

 

Sample data:

 

PATID      APPT_DATE    SERVICE_TYPE   PCMID

12345      11/19/2013         ER                      A1B2

12345      03/25/2014         ER                      S8U7

12345      05/15/2014         ER                      H5J8 

12345      08/14/2014         ER                      A1B2

12345      11/06/2014         ER                      D4D5

23456      04/15/2011         ER                      H5J8

34567      02/11/2012         ER                      K9L6

34567      04/15/2012         ER                      B5D4

34567      08/26/2012         ER                      X4P8

34567      01/14/2013         ER                      J8D6

34567      02/27/2013         ER                      G7H5

 

From this data, PATID = 12345 is the only patient that should be identified as meeting the criteria of having 5 or more ER visits within any 12 month period.

 

I've been trying to modify PROC SQL syntax from a colleague - - but since her criteria was any 2 visits within a rolling 12 month period, I can't figure out how to make it work.

 

Any suggestions for the best way to approach this problem?

 

Thanks!

 

 

14 REPLIES 14
Ksharp
Super User

Assuming your dataset is not big. Pick up the obs whose COUNT is greater than 5 .

 

data have;
input PATID      APPT_DATE : mmddyy10.   SERVICE_TYPE $  PCMID $;
format APPT_DATE  mmddyy10.;
cards;
12345      11/19/2013         ER                      A1B2
12345      03/25/2014         ER                      S8U7
12345      05/15/2014         ER                      H5J8 
12345      08/14/2014         ER                      A1B2
12345      11/06/2014         ER                      D4D5
23456      04/15/2011         ER                      H5J8
34567      02/11/2012         ER                      K9L6
34567      04/15/2012         ER                      B5D4
34567      08/26/2012         ER                      X4P8
34567      01/14/2013         ER                      J8D6
34567      02/27/2013         ER                      G7H5
;
run;
proc sql;
create table want as
 select *,(select count(*) from have 
    where PATID=a.PATID and APPT_DATE between intnx('year',a.APPT_DATE,-1,'s') and a.APPT_DATE)
     as count
  from have as a;
quit;
DeepakSwain
Pyrite | Level 9

Hi Ksharp,

I think addition of one more line to your fantastic code will give a better summarized picture:

 

proc sql;
create table want as
 select *,(select count(*) from have 
    where PATID=a.PATID and APPT_DATE between intnx('year',a.APPT_DATE,-1,'s') and a.APPT_DATE)
     as count
  from have as a
where calculated count ge 5;
quit;

Regards,

Swain
SM1
Obsidian | Level 7 SM1
Obsidian | Level 7

Ksharp,

 

Thank you for the reply . . . and I apologize for the very long delay in responding.

 

I'm working with a really large dataset so running your suggested syntax does take a long time.

 

I do have a question.

 

I ran the following syntax on the dataset after I de-duplicated by PATID and APPT_DATE. Some patients had multiple ER visits on the same day, but I wanted only those patients with 5+ ER visits on unique days.

 

proc sql;
create table er_ge5 as
 select *,(select count(*) from er_patients
    where PATID=a.PATID and APPT_DATE between intnx('year',a.APPT_DATE,-1,'s') and a.APPT_DATE)
     as count
  from er_patients as a
where calculated count ge 5;
quit;

 

I then sorted the new table (er_ge5) by PATID and appt_date and confirmed that count = 5 for the 1st event for each patient.

 

I've provided output for 1 patient. Everything looks fine until the row with count '16' - there are multiple rows with identical counts, but these are records for the same patient.

 

PATIDAPPT_DATECOUNT
444444444412/20/20125
444444444412/26/20126
444444444412/27/20127
44444444441/6/20138
44444444441/8/20139
44444444441/12/201310
44444444441/20/201311
44444444442/10/201312
44444444442/23/201313
44444444443/11/201314
44444444445/18/201315
44444444445/30/201316
44444444448/21/201316
444444444411/6/201316
444444444411/17/201317
444444444412/23/201315
444444444412/28/201314
444444444412/30/201315
44444444441/15/201413
44444444441/16/201414
44444444441/17/201415
44444444441/21/201415
44444444442/5/201416
44444444442/7/201417
44444444442/8/201418
44444444443/2/201417
44444444443/5/201418
44444444443/9/201419
44444444443/10/201420
44444444443/23/201420
44444444444/1/201421
44444444444/3/201422
44444444444/16/201423
44444444445/6/201424
44444444445/17/201425
44444444445/22/201425
44444444446/5/201425
44444444446/7/201426

 

I'm not sure why the count variable seems to be having problems. Should I be concerned?

 

Thanks!

 

 

 

Ksharp
Super User

"

Everything looks fine until the row with count '16' - there are multiple rows with identical counts, but these are records for the same patient.

"

I don't know what you mean.

Where is wrong ? Better post some data to demonstrate it .

 

Ksharp
Super User

Do you want unique COUNT for each ID ? That is easy by PROC SORT + NODUPKEY.

 

If your table is big ,try ARRAY or HashTable:

 

 

data have;
input PATID      APPT_DATE : mmddyy10.   SERVICE_TYPE $  PCMID $;
format APPT_DATE  mmddyy10.;
cards;
12345      11/19/2013         ER                      A1B2
12345      03/25/2014         ER                      S8U7
12345      05/15/2014         ER                      H5J8 
12345      08/14/2014         ER                      A1B2
12345      11/06/2014         ER                      D4D5
23456      04/15/2011         ER                      H5J8
34567      02/11/2012         ER                      K9L6
34567      04/15/2012         ER                      B5D4
34567      08/26/2012         ER                      X4P8
34567      01/14/2013         ER                      J8D6
34567      02/27/2013         ER                      G7H5
;
run;

data want;
 array x{99999} _temporary_;
 call missing(of x{*});
 
do i=1 by 1 until(last.PATID);
 set have;
 by PATID;
 x{i}=APPT_DATE;
end;

do until(last.PATID);
 set have;
 by PATID;
 n=0;
 do i=intnx('year',APPT_DATE,-1,'s') to APPT_DATE;
  if i in x then n+1;
 end;
 if n ge 5 then output;
end;

drop i;
run;
PGStats
Opal | Level 21

Fairly efficient way to do this with an array:

 

data have;
input PATID APPT_DATE : mmddyy10. SERVICE_TYPE $  PCMID $;
format APPT_DATE  mmddyy10.;
datalines;
12345  11/19/2013   ER    A1B2
12345  03/25/2014   ER    S8U7
12345  05/15/2014   ER    H5J8 
12345  08/14/2014   ER    A1B2
12345  11/06/2014   ER    D4D5
23456  04/15/2011   ER    H5J8
34567  02/11/2012   ER    K9L6
34567  04/15/2012   ER    B5D4
34567  08/26/2012   ER    X4P8
34567  01/14/2013   ER    J8D6
34567  02/27/2013   ER    G7H5
;

data want;
array d{0:4} _temporary_;
call missing(of d{*});
do i = 1 by 1 until(last.patid);
    set have; by patid;
    d{mod(i,5)} = appt_date;
    if i >=5 then
        /* Note d{mod(i+1,5)} is the oldest date in array d */
        if intck("YEAR", d{mod(i+1,5)}, appt_date, "CONTINUOUS") < 1 then output;
    end;
drop i;
run;

/* Optional : remove multiple entries for same patient */
proc sort data=want nodupkey; by patid; run;
PG
SM1
Obsidian | Level 7 SM1
Obsidian | Level 7

PG Stats,

 

Thank you so much for the suggested syntax.

 

This syntax runs much faster on my dataset then K Sharp's PROC SQL syntax, but there seems to be a problem.

 

I deduplicated my data set by PATID and APPT_DATE (so that each ER visit has a unique APPT_DATE). I then ran your suggested syntax (although I dropped the line to delete 'i'):

 

data er_patients_count;
array d{0:4} _temporary_;
call missing(of d{*});
do i = 1 by 1 until(last.PATID);
    set er_patients; by PATID;
    d{mod(i,5)} = APPT_DATE;
    if i >=5 then
        /* Note d{mod(i+1,5)} is the oldest date in array d */
        if intck("YEAR", d{mod(i+1,5)}, APPT_DATE, "CONTINUOUS") < 1 then output;
    end;
run;

 

The biggest concern is that the variable 'i' is not equal to 5 for each patient's 1st appt_date.

 

I sorted the new table by PATID and APPT_DATE, and created a new variable FIRST_ALL = FIRST.PATID. The syntax and frequency output for my table (N=2,086) follow.

 

proc freq data = er_patients_count;
    where first_all = 1;
    tables i;
run;

 

SAS Output

 

iFrequencyPercentCumulativeCumulative
FrequencyPercent
5115255.23115255.23
632415.53147670.76
72059.83168180.58
81507.19183187.78
91266.04195793.82
10653.12202296.93
11311.49205398.42
12150.72206899.14
13100.48207899.62
1450.24208399.86
1530.142086100
 
 

 

Here's sample output for 4 different patients:

 

iPATIDAPPT_DATE
844444444445/10/2013
944444444445/26/2013
1044444444446/27/2013
11444444444411/6/2013
1244444444441/21/2014
1344444444442/23/2014
1544444444449/16/2014
16444444444410/10/2014
17444444444411/25/2014
18444444444412/13/2014
19444444444412/21/2014
20444444444412/27/2014
555555555557/1/2014
655555555559/4/2014
10666666666611/12/2014
977777777774/7/2014
1077777777777/1/2014
1177777777777/6/2014
1277777777778/10/2014
1377777777779/6/2014
1477777777779/15/2014
15777777777710/9/2014
16777777777710/30/2014
17777777777711/7/2014
18777777777711/24/2014
1977777777774/17/2015
2077777777777/31/2015

 

Patient 4444444444 - 1st event i = 8, no i= 14

Patient 5555555555 - looks perfect

Patient 6666666666 - only 1 event, i = 10

Patient 7777777777 - 1st event i = 9, but everything else looks perfect

 

I'm not sure why there's a problem with counting the events.

 

Any suggestions?

 

Thanks so much!

PGStats
Opal | Level 21

The output from the data step is the last visit of a sequence of 5 visits that occured within one year. Every set, even if they overlap, is output that way. If a patient has 7 visits within a year, you will get visits 5 (representing the sequence 1-5), 6 (2-6) and 7 (3-7).

 

Please describe what you need as output,

  • when a sequence is longer than 5 visits,
  • when there are many disjoint sequences for a patient
  • when a long sequence (> 1 year) contains many overlapping sub-sequences (< 1 year)
PG
SM1
Obsidian | Level 7 SM1
Obsidian | Level 7

PG Stats,

 

I'm trying to achieve 2 objectives:

 

1. Identify all patients with 5+ ER visits during any rolling 12-month period between 1 OCT 2013 and 31 OCT 2015.

 

To accomplish this objective, I need an output variable to indicate that a patient met this basic criteria and should therefore be included in the sample. I'm not concerned with the number of ER visits or anything else.

 

2. Once I've identified the patients in my sample, I will extract all of their ER records for a larger period of time (1 OCT 2010 to 1 JUN 2016). Some patients have a number of ER visits across multiple years that meet the criteria of 5+ ER visits in any 12-month rolling period when the anchor date can slide from visit 1 to visit 2 to visit 3 and so on.

 

For example:

 

Visit 1 (appt_date = 15 JAN 2012)

Visit 2 (appt_date = 1 MAR 2012)

Visit 3 (appt_date = 1 JUN 2012)

Visit 4 (appt_date = 1 AUG 2012)

Visit 5 (appt_date = 1 OCT 2012)

Visit 6 (appt_date = 1 JAN 2013)

Visit 7 (appt_date = 1 APR 2013)

Visit 8 (appt_date = 1 JUL 2013)

Visit 9 (appt_date = 1 SEP 2013)

Visit 10 (appt_date = 1 DEC 2013)

Visit 11 (appt_date = 1 MAR 2014)

Visit 12 (appt_date = 1 MAY 2014)

 

For this patient, I need to identify 2 dates:

 

1. The date of the 1st ER visit  so I can create a new variable "START_DATE" = 15 JAN 2012

2. The date of the final ER visit so I can create a new variable "END_DATE" = 1 MAY 2014

 

These 2 new variables will be used to define the patient's period of high use of ER services.

 

To accomplish this objective, I need an output variable that would flag the 1st ER visit and the final ER visit for the series of qualifying ER visits. Or output the 1st and the final ER records to a new table.

 

 

 

PGStats
Opal | Level 21

So, given that your "ER records for a larger period of time" are in dataset largerPeriodAppt, you would be looking for something like this:

 

 


data fiveER;
array d{0:4} _temporary_;
call missing(of d{*});
do i = 1 by 1 until(last.patid);
    set have; by patid;
    d{mod(i,5)} = appt_date;
    if i >=5 then
        /* Note d{mod(i+1,5)} is the oldest date in array d */
        first_of5_Appt_date = d{mod(i+1,5)};
        if intck("YEAR", first_of5_Appt_date, appt_date, "CONTINUOUS") < 1 then output;
    end;
format first_of5_Appt_date yymmdd10.;
drop i;
run;

proc sql;

create table fiveERcriteriaPatients as
select 
    PATID,
    min(APPT_DATE) as START_DATE format=yymmdd10.,
    max(APPT_DATE) as END_DATE format=yymmdd10. 
from largerPeriodAppt
where PATID in (select PATID from fiveER)
group by PATID;

quit;

(untested)

 

PG
SM1
Obsidian | Level 7 SM1
Obsidian | Level 7

PG Stats,

 

I ran your syntax as follows:

 

data fiveER;
retain patid appt_date first_of5_Appt_date i;
array d{0:4} _temporary_;
call missing(of d{*});
do i = 1 by 1 until(last.patid);
    set ER_patients;
    by patid;
    d{mod(i,5)} = appt_date;
    if i >=5 then
        /* Note d{mod(i+1,5)} is the oldest date in array d */
        first_of5_Appt_date = d{mod(i+1,5)};
        if intck("YEAR", first_of5_Appt_date, appt_date, "CONTINUOUS") < 1 then output;
    end;
format first_of5_Appt_date yymmdd10.;
run;

 

It looks like the syntax worked perfectly for patient 1.  For each subsequent patient, it seems like the syntax failed to recognize last.PATID, so the value of APPT_DATE from the prior patient was retained.

 

Here's the output for patients 1, 2, 3, and 4:

 

PATIDAPPT_DATEFIRST_OF5_APPT_DATEi
111111111111/6/2013.1
111111111111/17/2013.2
111111111112/23/2013.3
111111111112/28/2013.4
111111111112/30/201311/6/20135
11111111111/15/201411/17/20136
11111111111/16/201412/23/20137
11111111112/5/201412/28/20138
11111111112/8/201412/30/20139
11111111113/2/20141/15/201410
11111111113/10/20141/16/201411
11111111114/16/20142/5/201412
11111111115/6/20142/8/201413
111111111112/9/20132/8/20141
111111111112/15/20132/8/20142
111111111112/26/20132/8/20143
11111111115/23/20142/8/20144
222222222211/6/20132/8/20141
22222222221/21/20142/8/20142
22222222222/23/20142/8/20143
22222222228/11/20142/8/20144
22222222229/16/201411/6/20135
222222222210/10/20141/21/20146
222222222211/25/20142/23/20147
222222222212/13/20148/11/20148
222222222212/27/20149/16/20149
333333333312/27/20139/16/20141
33333333338/28/20149/16/20142
33333333339/22/20149/16/20143
333333333311/16/20149/16/20144
444444444410/13/20139/16/20141
444444444411/11/20139/16/20142
444444444411/13/20139/16/20143
44444444444/6/20149/16/20144

 

 

 

 

PGStats
Opal | Level 21

I can't do any testing without data.

PG
SM1
Obsidian | Level 7 SM1
Obsidian | Level 7

I've attached an Excel file with 134 records for approximately 20 patients. The only variables included are APPT_DATE and PATID.

PGStats
Opal | Level 21

Ah! The code was missing a do block. I also added a sorting step, just in case.

 

proc sort data=have; by patid appt_date; run;

data fiveER;
array d{0:4} _temporary_;
call missing(of d{*});
do i = 1 by 1 until(last.patid);
    set have; by patid;
    d{mod(i,5)} = appt_date;
    if i >=5 then do;
        /* Note d{mod(i+1,5)} is the oldest date in array d */
        first_of5_Appt_date = d{mod(i+1,5)};
        if intck("YEAR", first_of5_Appt_date, appt_date, "CONTINUOUS") < 1 then output;
        end;
    end;
format first_of5_Appt_date appt_date yymmdd10.;
drop i;
run;
PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 3080 views
  • 2 likes
  • 4 in conversation