DATA Step, Macro, Functions and more

Group by member ID and consecutive dates

Reply
Occasional Contributor Jse
Occasional Contributor
Posts: 17

Group by member ID and consecutive dates

Data Example:

                                

MemberID

Admit_date

      cost

Group

1111

1/1/10

3

Unavoidable

1111

1/1/10

2

Avoidable

1111

1/2/10

4

Unavoidable

2222

1/20/10

1

Avoidable

Would like to group by MemberID and consecutive admit dates in to order sum cost, and count the number of admit ‘episodes’ to the ER as well as the length of the ER episode span (unique count of admit days in that span). 

  • Defining an ER  'episode' by all the claims with consecutive admit dates for a member id
  • Going from many claim lines to one line per ER ‘episode’. 
  • Only have a single date variable to work with (no discharge date). 

Deisred Output:

MemberID

Admit_date

Episode cost     (sum of all claims in consecutive days)

Group

Episode count

Episode Span (unique count of admitdt for consecutive claims)

1111

1/1/10

9

Unavoidable

1

2

2222

1/20/10

1

Avoidable

1

1

I made a little headway with the information on this thread:


I am having trouble modifying to meet my needs.  Thank you for your time.  Any help is greatly appreciated. 

Contributor
Posts: 26

Re: Group by member ID and consecutive dates

For needs like you describe I tend to use Proc SQL. The following will create a table, populate it, and give you the described result using Proc SQL.

proc sql;
create table myStuff(MemberID Char(4),
                                 Admit_Date Date,
                                 cost int,
                                 Group char(16));

  insert into myStuff values('1111', '01-JAN-2010'd, 3, 'Unavoidable');
  insert into myStuff values('1111', '01-JAN-2010'd, 2, 'Avoidable');
  insert into myStuff values('1111', '02-JAN-2010'd, 4, 'Unavoidable');
  insert into myStuff values('2222', '20-JAN-2010'd, 1, 'Avoidable');

  select MemberID, Min(Admit_Date) as Admit_Date, sum(cost) as Episode_Cost, 1 as Episode_Count, Count(Distinct(Admit_Date)) as Episode_Span from MyStuff group by MemberID;
quit;

If you want to have your answer by MemberID and Admit_Date, then just add Admit_Date to the "group by" clause: "group by MemberID, Admit_Date".

Occasional Contributor Jse
Occasional Contributor
Posts: 17

Re: Group by member ID and consecutive dates

Thank you for the response.  I dont see how this is grouping by consecutive dates for each member ID.  Members will have multiple spans throughout the set.    Using min(admit_date) will only give me the earliest admit_date for each member. 

Using proc sql, I would want to group by Member ID and consecutive admit dates  to allow for multiple spans for each member (I think...).

Thank you for taking a look at my problem.

Jesse

Super User
Posts: 17,963

Re: Group by member ID and consecutive dates

How did you calculate the group?

You probably need to provide more sample input and output data.

Collapsing data of this form isn't always a trivial process so you might not get a full solution on here.

Occasional Contributor Jse
Occasional Contributor
Posts: 17

Re: Group by member ID and consecutive dates

Group is calculated by comparing all diagnoses on the claim line to a list of diagnoses related to avoidable ER visits.  If the claim line only contains avoidable diagnoses then the claim is flagged as avoidable.  If there are any other diagnoses (not in the avoidable diagnosis list)  present on the claim then it is flagged unavoidable.There are multiple claims per ER 'episode'.  If any of the claim lines in the episode span are flagged as unavoidable then the ER 'episode' is flagged as unavoidable.

I didn't mention this group flagging issue as I am still working on the roll up for ER episode per member.  I will add more input and desired output data.

Yes, this is certainly not trivial for me, so I came to the experts for advice.  Thank you for your response.

Jesse

Contributor
Posts: 26

Re: Group by member ID and consecutive dates

Jesse,

Thank you for further defining your requirements, you need to move back to a dataset ordered by MemberID and Admit_Date. Use a manual "output" that is triggered by your MemberID or your Admit_Date changing. Each iteration you have to "retain" your calculation values and only keep the calculated values for your output.

Are you familiar with using "output" with a dataset?

Les

Contributor
Posts: 26

Re: Group by member ID and consecutive dates


Jse,

You are correct, it does not group by Member ID and Date, but then neither does your example output data. That is why I add the comment:

If you want to have your answer by MemberID and Admit_Date, then just add Admit_Date to the "group by" clause: "group by MemberID, Admit_Date".

This will give:

   select MemberID, Admit_Date, sum(cost) as Episode_Cost, 1 as Episode_Count, Count(Distinct(Admit_Date)) as Episode_Span from MyStuff group by MemberID, Admit_date;

And if you want to be assured sequential MemberID's and Admit_Date's you need an "order by" clause" which will give:

select MemberID, Admit_Date, sum(cost) as Episode_Cost, 1 as Episode_Count, Count(Distinct(Admit_Date)) as Episode_Span from MyStuff group by MemberID, Admit_date order by MemberID, Admit_date;

Which I believe is what you asked for, but did not show, in the first place.

I am still at a loss as to what you want to do with your group field: in your sample output it is taken from the first obs of the group by MemberID--in both cases shown, it has been ignored.

Respected Advisor
Posts: 3,124

Re: Group by member ID and consecutive dates

More business rules need to be revealed.

How do you select group "Unavoidable" for "1111' in stead of "avoidable", where they have identical memberid and admit_date?

Respected Advisor
Posts: 3,124

Re: Group by member ID and consecutive dates

if take the group(unavailable/available) out of the picture, then the following may get you started:

data have;

     input MemberID Admit_date :mmddyy10.   cost Group:$20.;

     format admit_date mmddyy10.;

     cards;

1111 1/1/10 3 Unavoidable

1111 1/1/10 2 Avoidable

1111 1/2/10 4 Unavoidable

1111 1/7/10 8 Unavoidable

2222 1/20/10 1 Avoidable

;

/*simple dif()+simple SQL */

data have_dif;

     set have;

     by memberid admit_date;

     if first.memberid then

           grp=1;

     else grp+(dif(admit_date)>1);

run;

proc sql;

     create table want_1 as

           select distinct MemberID, Admit_date , sum(cost) as cost , range(admit_date)+1 as Episode_span, grp

                from have_dif

                     group by MemberID,  grp

                           having Admit_date = min(Admit_date)

     ;

quit;

seeing attempt of providing a pure SQL approach, frankly, SQL is not a sequential process, so there is a risk of  overstretching if you try to implement SQL for something that it is NOT good at. As you can see from the following, it is kinda strenuous, not only for the coding, but also for the performance. So the following approach is not recommended and only has 'research' value.

/*Complex Proc SQl*/

proc sql;

     create table want_2 as

           select distinct MemberID, Admit_date , sum(cost) as cost , range(admit_date)+1 as Episode_span ,

                1+(not exists(select * from have where a.memberid=memberid and admit_date ne a.admit_date and admit_date between a.admit_date-1 and a.admit_date+1)) as grp

           from have a

                group by MemberID, calculated grp

                     having Admit_date = min(Admit_date)

     ;

quit;

Super User
Posts: 9,691

Re: Group by member ID and consecutive dates

As you said : "Defining an ER  'episode' by all the claims with consecutive admit dates for a member id"

Every consecutive admit dates is a group, then Episode_count should always be 1. right ?

data have;
     input MemberID Admit_date :mmddyy10.   cost Group:$20.;
     format admit_date mmddyy10.;
     cards;
1111 1/1/10 3 Unavoidable
1111 1/1/10 2 Avoidable
1111 1/2/10 4 Unavoidable
1111 1/7/10 8 Unavoidable
2222 1/20/10 1 Avoidable
;
run;
data temp;
 set have;
 if MemberID ne lag(MemberID) or dif(Admit_date) gt 1 then n+1;
run;
proc summary data=temp;
by n;
var cost  Admit_date;
output out=summary(drop=_:) sum(cost)=Episode_cost range(Admit_date)=Episode_span;
format Admit_date best8.;
run;
data want;
 merge temp summary;
 by n;
 Episode_span=Episode_span+1;
 if first.n;
 drop n cost;
run;

Xia Keshan

Occasional Contributor Jse
Occasional Contributor
Posts: 17

Re: Group by member ID and consecutive dates

Thank you all so much for the help.  I was able to get it working with this bit of code from Reeza found on a previous thread:

Data Test_ER_2;

set avoid1;

by memid admitdt;

retain continuous_group 1;

prev_date=lag(admitdt);

if first.memid then do;

  prev_date=.;

  continuous_group=1;

end;

format prev_date date9.;

diff=intck('day',prev_date,admitdt);

if intck('day', prev_date, admitdt)>1 then continuous_group + 1;

run;

Assigned a continuous_group number for each span for each memberID.  Was then able to aggregate using member ID and continuous_group.  I am going to try it with the code provided above as well.  Thank you everyone again for the help Smiley Happy

Ask a Question
Discussion stats
  • 10 replies
  • 772 views
  • 6 likes
  • 5 in conversation