BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
huongc2
Calcite | Level 5

Hi, I am having one problem that I could not figure out how to do this. 

 

Below is an example of the data I have:

data have;

input patient $ admitted_date :mmddyy10. discharge_date :mmddyy10. disposition $50.;format admitted_date discharge_date yymmdd10.;

datalines;

A 3/2/2020  3/15/2020 transferred to other hospital

A 3/15/2020 4/2/2020  home

B 5/3/2020  6/27/2020 transferred to other hospital

B 6/27/2020 6/29/2020 home

C 8/1/2020  8/25/2020 died

D 5/15/2020 6/1/2020  transferred to other hospital

D 6/1/2020  7/2/2020  home

A 6/2/2020  6/25/2020 died

B 8/1/2020  8/2/2020  transferred to other hospital

B 8/2/2020  8/15/2020 transferred to other hospital

B 8/15/2020 8/25/2020 died;

 

Patient got to the hospital and maybe they got transferred to other hospitals because of the low bed capacity. I only want to capture them as 1 admission to the hospital (if they got transferred). However, they could get admitted again after that, their admission_count will be 2 (or even 3,4,5 times if they had to go to the hospitals again) . From the data above, I want something like this:

 

Patientadmitted datedischarge_datedispositionMin(admitted_date)Max(discharge_date)admission_countfinal_disposition
A3/2/20203/15/2020transferred to other hospital3/2/20204/2/20201home
A3/15/20204/2/2020home3/2/20204/2/20201home
B5/3/20206/27/2020transferred to other hospital5/3/20206/29/20201home
B6/27/20206/29/2020home5/3/20206/29/20201home
C8/1/20208/25/2020died8/1/20208/25/20201died
D5/15/20206/1/2020transferred to other hospital5/15/20207/2/20201home
D6/1/20207/2/2020home5/15/20207/2/20201home
A6/2/20206/25/2020died6/2/20206/25/20202died
B8/1/20208/2/2020transferred to other hospital8/1/20208/25/20202died
B8/2/20208/15/2020transferred to other hospital8/1/20208/25/20202died
B8/15/20208/25/2020died8/1/20208/25/20202died

 

Please help! I don't know how to do this 😞

 

Thank you so much!

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

OK, so we see the notoriously bad performance of subselects in action; replace the SQL with these data steps:

data sum;
set counts;
by patient admission_count;
retain min_admitted_date;
format min_admitted_date yymmdd10.;
if first.admission_count
then min_admitted_date = admitted_date;
if last.admission_count;
keep
  patient
  admission_count
  min_admitted_date
  discharge_date
  disposition
;
rename
  discharge_date=max_discharged_date
  disposition=final_disposition
;
run;

data want;
merge
  counts
  sum
;
by patient admission_count;
run;

View solution in original post

7 REPLIES 7
Kurt_Bremser
Super User

You will need two steps for this, first to create the admission counts, and then to create the summary variables and merge them back to the original data:

Source data:

data have;
input patient $ admitted_date :mmddyy10. discharge_date :mmddyy10. disposition $50.;
format admitted_date discharge_date yymmdd10.;
datalines;
A 3/2/2020  3/15/2020 transferred to other hospital
A 3/15/2020 4/2/2020  home
B 5/3/2020  6/27/2020 transferred to other hospital
B 6/27/2020 6/29/2020 home
C 8/1/2020  8/25/2020 died
D 5/15/2020 6/1/2020  transferred to other hospital
D 6/1/2020  7/2/2020  home
A 6/2/2020  6/25/2020 died
B 8/1/2020  8/2/2020  transferred to other hospital
B 8/2/2020  8/15/2020 transferred to other hospital
B 8/15/2020 8/25/2020 died
;

Please post your example data in this way in the future, as it enables others to exactly recreate a dataset with a simple copy/paste and submit. See it not only as a basic courtesy, but also as a means to speed up getting a valid working answer.

Sort and create the admission_count:

proc sort data=have;
by patient admitted_date;
run;

data counts;
set have;
by patient;
l_dis = lag(discharge_date);
if first.patient
then admission_count = 1;
else if l_dis lt admitted_date then admission_count + 1;
drop l_dis;
run;

Now, in SQL, create the max and min dates, and the final disposition, and merge back to the source data:

proc sql;
create table want as
  select
    patient,
    admitted_date,
    discharge_date,
    disposition,
    min(admitted_date) format yymmdd10. as min_admitted_date,
    max(discharge_date) format yymmdd10. as max_discharged_date,
    admission_count,
    (
      select disposition
      from counts t2
      where t1.patient = t2.patient and t1.admission_count = t2.admission_count
      group by t2.patient, t2.admission_count
      having t2.admitted_date = max(t2.admitted_date)
    ) as final_disposition
  from counts t1
  group by patient, admission_count
;
quit;

If your dataset is large, the sub-select might become inefficient; in that case, it might be necessary to do some trickery in the data step to create final_disposition, or move everything there (code might not be as easy to read).

huongc2
Calcite | Level 5

Good morning! Thank you soooo much for helping me on this. I edited my post as suggested.

 

Regarding the codes, the first step works so well on my data! But as you mentioned at the end, I might have to do some tricks to get the second step works since my data is large. (40,000 obs).

 

I wonder after creating the counts data, maybe I could group by patient and admission_count, then do first.admitted_date and last.discharge_Date to capture their min and max? do you think it would work? Do you have any other suggestions in this case?

 

Again, thank you so much for helping on this. I just learned a new function "lag" from you. I am so happy that it works so well (and fast).

huongc2
Calcite | Level 5

Hi,

 

Below is my log. I tried with a small subset first:

 

NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.MIN_MAX created, with 3367 rows and 8 columns.

quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 15:38.36
cpu time 15:43.67

Kurt_Bremser
Super User

OK, so we see the notoriously bad performance of subselects in action; replace the SQL with these data steps:

data sum;
set counts;
by patient admission_count;
retain min_admitted_date;
format min_admitted_date yymmdd10.;
if first.admission_count
then min_admitted_date = admitted_date;
if last.admission_count;
keep
  patient
  admission_count
  min_admitted_date
  discharge_date
  disposition
;
rename
  discharge_date=max_discharged_date
  disposition=final_disposition
;
run;

data want;
merge
  counts
  sum
;
by patient admission_count;
run;
huongc2
Calcite | Level 5

OMG, it works perfectly and soooo fast.

Thank you so much!!!!!!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1067 views
  • 0 likes
  • 2 in conversation