Data Example:
MemberID | Admit_date | cost | Group |
1111 | 1/1/10 | 3 | Unavoidable |
1111 | 1/1/10 | 2 | Avoidable |
1111 | 1/2/10 | 4 | Unavoidable |
2222 | 1/20/10 | 1 | Avoidable |
Would like to group by MemberID and consecutive admit dates in to order sum cost, and count the number of admit ‘episodes’ to the ER as well as the length of the ER episode span (unique count of admit days in that span).
Deisred Output:
MemberID | Admit_date | Episode cost (sum of all claims in consecutive days) | Group | Episode count | Episode Span (unique count of admitdt for consecutive claims) |
1111 | 1/1/10 | 9 | Unavoidable | 1 | 2 |
2222 | 1/20/10 | 1 | Avoidable | 1 | 1 |
I made a little headway with the information on this thread:
I am having trouble modifying to meet my needs. Thank you for your time. Any help is greatly appreciated.
For needs like you describe I tend to use Proc SQL. The following will create a table, populate it, and give you the described result using Proc SQL.
proc sql;
create table myStuff(MemberID Char(4),
Admit_Date Date,
cost int,
Group char(16));
insert into myStuff values('1111', '01-JAN-2010'd, 3, 'Unavoidable');
insert into myStuff values('1111', '01-JAN-2010'd, 2, 'Avoidable');
insert into myStuff values('1111', '02-JAN-2010'd, 4, 'Unavoidable');
insert into myStuff values('2222', '20-JAN-2010'd, 1, 'Avoidable');
select MemberID, Min(Admit_Date) as Admit_Date, sum(cost) as Episode_Cost, 1 as Episode_Count, Count(Distinct(Admit_Date)) as Episode_Span from MyStuff group by MemberID;
quit;
If you want to have your answer by MemberID and Admit_Date, then just add Admit_Date to the "group by" clause: "group by MemberID, Admit_Date".
Thank you for the response. I dont see how this is grouping by consecutive dates for each member ID. Members will have multiple spans throughout the set. Using min(admit_date) will only give me the earliest admit_date for each member.
Using proc sql, I would want to group by Member ID and consecutive admit dates to allow for multiple spans for each member (I think...).
Thank you for taking a look at my problem.
Jesse
How did you calculate the group?
You probably need to provide more sample input and output data.
Collapsing data of this form isn't always a trivial process so you might not get a full solution on here.
Group is calculated by comparing all diagnoses on the claim line to a list of diagnoses related to avoidable ER visits. If the claim line only contains avoidable diagnoses then the claim is flagged as avoidable. If there are any other diagnoses (not in the avoidable diagnosis list) present on the claim then it is flagged unavoidable.There are multiple claims per ER 'episode'. If any of the claim lines in the episode span are flagged as unavoidable then the ER 'episode' is flagged as unavoidable.
I didn't mention this group flagging issue as I am still working on the roll up for ER episode per member. I will add more input and desired output data.
Yes, this is certainly not trivial for me, so I came to the experts for advice. Thank you for your response.
Jesse
Jesse,
Thank you for further defining your requirements, you need to move back to a dataset ordered by MemberID and Admit_Date. Use a manual "output" that is triggered by your MemberID or your Admit_Date changing. Each iteration you have to "retain" your calculation values and only keep the calculated values for your output.
Are you familiar with using "output" with a dataset?
Les
Jse,
You are correct, it does not group by Member ID and Date, but then neither does your example output data. That is why I add the comment:
If you want to have your answer by MemberID and Admit_Date, then just add Admit_Date to the "group by" clause: "group by MemberID, Admit_Date".
This will give:
select MemberID, Admit_Date, sum(cost) as Episode_Cost, 1 as Episode_Count, Count(Distinct(Admit_Date)) as Episode_Span from MyStuff group by MemberID, Admit_date;
And if you want to be assured sequential MemberID's and Admit_Date's you need an "order by" clause" which will give:
select MemberID, Admit_Date, sum(cost) as Episode_Cost, 1 as Episode_Count, Count(Distinct(Admit_Date)) as Episode_Span from MyStuff group by MemberID, Admit_date order by MemberID, Admit_date;
Which I believe is what you asked for, but did not show, in the first place.
I am still at a loss as to what you want to do with your group field: in your sample output it is taken from the first obs of the group by MemberID--in both cases shown, it has been ignored.
More business rules need to be revealed.
How do you select group "Unavoidable" for "1111' in stead of "avoidable", where they have identical memberid and admit_date?
if take the group(unavailable/available) out of the picture, then the following may get you started:
data have;
input MemberID Admit_date :mmddyy10. cost Group:$20.;
format admit_date mmddyy10.;
cards;
1111 1/1/10 3 Unavoidable
1111 1/1/10 2 Avoidable
1111 1/2/10 4 Unavoidable
1111 1/7/10 8 Unavoidable
2222 1/20/10 1 Avoidable
;
/*simple dif()+simple SQL */
data have_dif;
set have;
by memberid admit_date;
if first.memberid then
grp=1;
else grp+(dif(admit_date)>1);
run;
proc sql;
create table want_1 as
select distinct MemberID, Admit_date , sum(cost) as cost , range(admit_date)+1 as Episode_span, grp
from have_dif
group by MemberID, grp
having Admit_date = min(Admit_date)
;
quit;
seeing attempt of providing a pure SQL approach, frankly, SQL is not a sequential process, so there is a risk of overstretching if you try to implement SQL for something that it is NOT good at. As you can see from the following, it is kinda strenuous, not only for the coding, but also for the performance. So the following approach is not recommended and only has 'research' value.
/*Complex Proc SQl*/
proc sql;
create table want_2 as
select distinct MemberID, Admit_date , sum(cost) as cost , range(admit_date)+1 as Episode_span ,
1+(not exists(select * from have where a.memberid=memberid and admit_date ne a.admit_date and admit_date between a.admit_date-1 and a.admit_date+1)) as grp
from have a
group by MemberID, calculated grp
having Admit_date = min(Admit_date)
;
quit;
As you said : "Defining an ER 'episode' by all the claims with consecutive admit dates for a member id"
Every consecutive admit dates is a group, then Episode_count should always be 1. right ?
data have; input MemberID Admit_date :mmddyy10. cost Group:$20.; format admit_date mmddyy10.; cards; 1111 1/1/10 3 Unavoidable 1111 1/1/10 2 Avoidable 1111 1/2/10 4 Unavoidable 1111 1/7/10 8 Unavoidable 2222 1/20/10 1 Avoidable ; run; data temp; set have; if MemberID ne lag(MemberID) or dif(Admit_date) gt 1 then n+1; run; proc summary data=temp; by n; var cost Admit_date; output out=summary(drop=_:) sum(cost)=Episode_cost range(Admit_date)=Episode_span; format Admit_date best8.; run; data want; merge temp summary; by n; Episode_span=Episode_span+1; if first.n; drop n cost; run;
Xia Keshan
Thank you all so much for the help. I was able to get it working with this bit of code from Reeza found on a previous thread:
Data Test_ER_2;
set avoid1;
by memid admitdt;
retain continuous_group 1;
prev_date=lag(admitdt);
if first.memid then do;
prev_date=.;
continuous_group=1;
end;
format prev_date date9.;
diff=intck('day',prev_date,admitdt);
if intck('day', prev_date, admitdt)>1 then continuous_group + 1;
run;
Assigned a continuous_group number for each span for each memberID. Was then able to aggregate using member ID and continuous_group. I am going to try it with the code provided above as well. Thank you everyone again for the help
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.