Re: Group by member ID and consecutive dates

Jse · Posted 01-22-2015 06:06 PM

Data Example:

MemberID	Admit_date	cost	Group
1111	1/1/10	3	Unavoidable
1111	1/1/10	2	Avoidable
1111	1/2/10	4	Unavoidable
2222	1/20/10	1	Avoidable

Would like to group by MemberID and consecutive admit dates in to order sum cost, and count the number of admit ‘episodes’ to the ER as well as the length of the ER episode span (unique count of admit days in that span).

Defining an ER 'episode' by all the claims with consecutive admit dates for a member id
Going from many claim lines to one line per ER ‘episode’.
Only have a single date variable to work with (no discharge date).

Deisred Output:

MemberID	Admit_date	Episode cost (sum of all claims in consecutive days)	Group	Episode count	Episode Span (unique count of admitdt for consecutive claims)
1111	1/1/10	9	Unavoidable	1	2
2222	1/20/10	1	Avoidable	1	1

I made a little headway with the information on this thread:

I am having trouble modifying to meet my needs. Thank you for your time. Any help is greatly appreciated.

morgalr · Posted 01-22-2015 07:00 PM

For needs like you describe I tend to use Proc SQL. The following will create a table, populate it, and give you the described result using Proc SQL.

proc sql;
create table myStuff(MemberID Char(4),
                                 Admit_Date Date,
                                 cost int,
                                 Group char(16));

insert into myStuff values('1111', '01-JAN-2010'd, 3, 'Unavoidable');
insert into myStuff values('1111', '01-JAN-2010'd, 2, 'Avoidable');
insert into myStuff values('1111', '02-JAN-2010'd, 4, 'Unavoidable');
insert into myStuff values('2222', '20-JAN-2010'd, 1, 'Avoidable');

select MemberID, Min(Admit_Date) as Admit_Date, sum(cost) as Episode_Cost, 1 as Episode_Count, Count(Distinct(Admit_Date)) as Episode_Span from MyStuff group by MemberID;
quit;

If you want to have your answer by MemberID and Admit_Date, then just add Admit_Date to the "group by" clause: "group by MemberID, Admit_Date".

Jse · Posted 01-22-2015 07:16 PM

Thank you for the response. I dont see how this is grouping by consecutive dates for each member ID. Members will have multiple spans throughout the set. Using min(admit_date) will only give me the earliest admit_date for each member.

Using proc sql, I would want to group by Member ID and consecutive admit dates to allow for multiple spans for each member (I think...).

Thank you for taking a look at my problem.

Jesse

Reeza · Posted 01-22-2015 07:37 PM

How did you calculate the group?

You probably need to provide more sample input and output data.

Collapsing data of this form isn't always a trivial process so you might not get a full solution on here.

Jse · Posted 01-22-2015 08:06 PM

Group is calculated by comparing all diagnoses on the claim line to a list of diagnoses related to avoidable ER visits. If the claim line only contains avoidable diagnoses then the claim is flagged as avoidable. If there are any other diagnoses (not in the avoidable diagnosis list) present on the claim then it is flagged unavoidable.There are multiple claims per ER 'episode'. If any of the claim lines in the episode span are flagged as unavoidable then the ER 'episode' is flagged as unavoidable.

I didn't mention this group flagging issue as I am still working on the roll up for ER episode per member. I will add more input and desired output data.

Yes, this is certainly not trivial for me, so I came to the experts for advice. Thank you for your response.

Jesse

morgalr · Posted 01-22-2015 08:23 PM

Jesse,

Thank you for further defining your requirements, you need to move back to a dataset ordered by MemberID and Admit_Date. Use a manual "output" that is triggered by your MemberID or your Admit_Date changing. Each iteration you have to "retain" your calculation values and only keep the calculated values for your output.

Are you familiar with using "output" with a dataset?

Les

morgalr · Posted 01-22-2015 07:53 PM

Jse,

You are correct, it does not group by Member ID and Date, but then neither does your example output data. That is why I add the comment:

If you want to have your answer by MemberID and Admit_Date, then just add Admit_Date to the "group by" clause: "group by MemberID, Admit_Date".

This will give:

select MemberID, Admit_Date, sum(cost) as Episode_Cost, 1 as Episode_Count, Count(Distinct(Admit_Date)) as Episode_Span from MyStuff group by MemberID, Admit_date;

And if you want to be assured sequential MemberID's and Admit_Date's you need an "order by" clause" which will give:

select MemberID, Admit_Date, sum(cost) as Episode_Cost, 1 as Episode_Count, Count(Distinct(Admit_Date)) as Episode_Span from MyStuff group by MemberID, Admit_date order by MemberID, Admit_date;

Which I believe is what you asked for, but did not show, in the first place.

I am still at a loss as to what you want to do with your group field: in your sample output it is taken from the first obs of the group by MemberID--in both cases shown, it has been ignored.

Haikuo · Posted 01-22-2015 08:05 PM

More business rules need to be revealed.

How do you select group "Unavoidable" for "1111' in stead of "avoidable", where they have identical memberid and admit_date?

Haikuo · Posted 01-22-2015 08:28 PM

if take the group(unavailable/available) out of the picture, then the following may get you started:

data have;

input MemberID Admit_date :mmddyy10. cost Group:$20.;

format admit_date mmddyy10.;

cards;

1111 1/1/10 3 Unavoidable

1111 1/1/10 2 Avoidable

1111 1/2/10 4 Unavoidable

1111 1/7/10 8 Unavoidable

2222 1/20/10 1 Avoidable

;

/*simple dif()+simple SQL */

data have_dif;

set have;

by memberid admit_date;

if first.memberid then

grp=1;

else grp+(dif(admit_date)>1);

run;

proc sql;

create table want_1 as

select distinct MemberID, Admit_date , sum(cost) as cost , range(admit_date)+1 as Episode_span, grp

from have_dif

group by MemberID, grp

having Admit_date = min(Admit_date)

;

quit;

seeing attempt of providing a pure SQL approach, frankly, SQL is not a sequential process, so there is a risk of overstretching if you try to implement SQL for something that it is NOT good at. As you can see from the following, it is kinda strenuous, not only for the coding, but also for the performance. So the following approach is not recommended and only has 'research' value.

/*Complex Proc SQl*/

proc sql;

create table want_2 as

select distinct MemberID, Admit_date , sum(cost) as cost , range(admit_date)+1 as Episode_span ,

1+(not exists(select * from have where a.memberid=memberid and admit_date ne a.admit_date and admit_date between a.admit_date-1 and a.admit_date+1)) as grp

from have a

group by MemberID, calculated grp

having Admit_date = min(Admit_date)

;

quit;

Ksharp · Posted 01-23-2015 03:06 AM

As you said : "Defining an ER 'episode' by all the claims with consecutive admit dates for a member id"

Every consecutive admit dates is a group, then Episode_count should always be 1. right ?

data have;
     input MemberID Admit_date :mmddyy10.   cost Group:$20.;
     format admit_date mmddyy10.;
     cards;
1111 1/1/10 3 Unavoidable
1111 1/1/10 2 Avoidable
1111 1/2/10 4 Unavoidable
1111 1/7/10 8 Unavoidable
2222 1/20/10 1 Avoidable
;
run;
data temp;
 set have;
 if MemberID ne lag(MemberID) or dif(Admit_date) gt 1 then n+1;
run;
proc summary data=temp;
by n;
var cost  Admit_date;
output out=summary(drop=_:) sum(cost)=Episode_cost range(Admit_date)=Episode_span;
format Admit_date best8.;
run;
data want;
 merge temp summary;
 by n;
 Episode_span=Episode_span+1;
 if first.n;
 drop n cost;
run;

Xia Keshan

Jse · Posted 01-23-2015 11:33 AM

Thank you all so much for the help. I was able to get it working with this bit of code from Reeza found on a previous thread:

Data Test_ER_2;

set avoid1;

by memid admitdt;

retain continuous_group 1;

prev_date=lag(admitdt);

if first.memid then do;

prev_date=.;

continuous_group=1;

end;

format prev_date date9.;

diff=intck('day',prev_date,admitdt);

if intck('day', prev_date, admitdt)>1 then continuous_group + 1;

run;

Assigned a continuous_group number for each span for each memberID. Was then able to aggregate using member ID and continuous_group. I am going to try it with the code provided above as well. Thank you everyone again for the help

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away