So I am trying to summarize data into distinct hospital stay.
data sample;
input member $ admission $ discharge $ hospital $;
datalines;
Jane 20150101 20150305 A
Jane 20150101 20150103 A
John 20160101 20160105 B
John 20160101 20160101 A
Jane 20190101 20190507 C
;
For my research purpose, my definition of a hospital stay is based on member, admission date, and hospital. So I am not considering internal transfer as a separate hospital stay. For example, in the sample, Jane was admitted to hospital A on 20150101, but internally transferred on 20150103 to another department/care team, and discharged from A on 20150305. I consider this a single hospital stay.
I tried the following and achieved what I wanted:
proc sort data=sample out=sample_srt;
by member admission discharge hospital;
run;
data want;
set sample_srt;
by member hospital admission;
if last.admission;
run;
What I don't understand is that if I try the following instead, I get something different.
data want2;
set sample_srt;
by member admission hospital;
if last.admission;
run;
I thought that the order of by group variables doesn't matter, but clearly I was wrong. I'd really appreciate if someone can explain why the order of by variables matters.
-AH
The order of variables in the BY statement creates a hierarchy. So BY A B C will create a different order than BY C B A.
In particular, keeping existing orders within a by group is NOT guaranteed if that order is not enforced by adding additional columns to the BY as needed.
In your case, I would sort
by member hospital admission discharge;
and then, in the following data step, use
by member hospital admission;
if last.admission;
so that you get the last discharge date for stays that share the same admission date.
Hi @aaronh The stay in a hospital would require a sort by member hospital admission discharge;
Whether a patient ever admitted to a hospital regardless of how many times or the tenure at a particular hospital would require a sort by member admission discharge;
The Order of BY groups DOES matter.
It's hierarchical, so your groups that are member/admission/hospital will be different than member/admission/hospital.
Here's a quick example of an explanation. Hospitals each have their own ID systems so that IDs that are 4 in Hospital 1 are not the same as the ID=4 in Hospital 2. To account for this you would need to group your variables by Hospital ID Admission.
In theory your example below should result in the same answers as you shouldn't have the same admission date to different hospitals but it clearly happens and I wonder if it's because my example above (ID different at different hospitals) isn't correct.
Here's an illustrated example from the documentation that may help.
@aaronh wrote:
So I am trying to summarize data into distinct hospital stay.
data sample; input member $ admission $ discharge $ hospital $; datalines; Jane 20150101 20150305 A Jane 20150101 20150103 A John 20160101 20160105 B John 20160101 20160101 A Jane 20190101 20190507 C ;
For my research purpose, my definition of a hospital stay is based on member, admission date, and hospital. So I am not considering internal transfer as a separate hospital stay. For example, in the sample, Jane was admitted to hospital A on 20150101, but internally transferred on 20150103 to another department/care team, and discharged from A on 20150305. I consider this a single hospital stay.
I tried the following and achieved what I wanted:
proc sort data=sample out=sample_srt; by member admission discharge hospital; run; data want; set sample_srt; by member hospital admission; if last.admission; run;
What I don't understand is that if I try the following instead, I get something different.
data want2; set sample_srt; by member admission hospital; if last.admission; run;
I thought that the order of by group variables doesn't matter, but clearly I was wrong. I'd really appreciate if someone can explain why the order of by variables matters.
-AH
The problem could be the same person is admitted to two different hospitals on the same day.
@smantha wrote:
The problem could be the same person is admitted to two different hospitals on the same day.
To handle this situation, you would need datetimes instead of dates, so you could determine the order of admissions within a day.
thanks Smantha for pointing that out. Yes, it happens because a patient could be admitted to hospital A, but it is determined that hospital A does not have the care level needed by the patient, so the patient is transferred immediately to a hospital with higher level of care.
data sample;
input member $ admission $ discharge $ hospital $;
admission_dt=input(admission, yymmdd8.);
discharge_dt=input(discharge, yymmdd8.);
format admission_dt discharge_dt yymmdd10.;
datalines;
Jane 20150101 20150305 A
Jane 20150101 20150103 A
John 20160101 20160105 B
John 20160101 20160101 A
Jane 20190101 20190507 C
;
proc summary data=sample nway missing noprint;
class Member hospital;
var admission_dt discharge_dt;
output min(admission_dt) = max(discharge_dt)= out= want;
run;
It is due to the way the data is sorted and location of variables , like in the below example sorting by and variables order of
member hospital admission will give 4 records due to hospital values distinguish the date 20160106 whereas in second example sorting by and variable order member admission hospital distinguish by admission date and we get 3 records where hospital is ignored as we are considering the admission date.
The order of variables in the BY statement creates a hierarchy. So BY A B C will create a different order than BY C B A.
In particular, keeping existing orders within a by group is NOT guaranteed if that order is not enforced by adding additional columns to the BY as needed.
In your case, I would sort
by member hospital admission discharge;
and then, in the following data step, use
by member hospital admission;
if last.admission;
so that you get the last discharge date for stays that share the same admission date.
Thank you all for your quick and informative responses to my question! And thank you @Reeza @Jagadishkatam @Kurt_Bremser for pointing out that the by group variables form a hierarchy.
I ran the following to visualize of how the by variables behave differently under different hierarchy.
data visualize1;
set sample_srt;
by member hospital admission;
if first.admission then frst_adm = 1;
else frst_adm = 0;
if last.admission then last_adm = 1;
else last_adm = 0;
if first.hospital then frst_pos = 1;
else frst_pos = 0;
if last.hospital then last_pos = 1;
else last_pos = 0;
run;
data visualize2;
set sample_srt;
by member admission hospital;
if first.admission then frst_adm = 1;
else frst_adm = 0;
if last.admission then last_adm = 1;
else last_adm = 0;
if first.hospital then frst_pos = 1;
else frst_pos = 0;
if last.hospital then last_pos = 1;
else last_pos = 0;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.