Is it possible to calculate median in Proc report?
I have a dataset that has a client ID, years, service type and length of stay (los).
What I want to do is calculate using proc report (or maybe tabulate??) the median length of stay by service type for each year.
Is that possible or do I have to do this in proc means? or is there another way of doing this?
Asking because I am trying to use proc report to have columns that count the number of '1's under each year and then have another column that will calculate the median los by service type for each year. You will notice in my sample below that no two years have a '1' in the same row.
data have;
input ID $ cy2014 cy2015 cy2016 cy2017 cy2018 cy2019 service_type $ los;
datalines;
1 0 0 0 0 1 0 Outpatient 313
2 0 1 0 0 0 0 Hospital 14
3 0 0 0 0 0 1 Outpatient 59
4 0 0 0 0 1 0 Outpatient 104
5 0 0 0 0 0 1 Outpatient 57
6 0 0 0 0 0 1 Outpatient 34
7 0 0 0 0 1 0 ER 6
8 0 0 1 0 0 0 ER 4
9 0 0 1 0 0 0 HomeCare 332
10 1 0 0 0 0 0 Outpatient 25
11 0 0 0 0 1 0 Clinic 624
12 0 0 1 0 0 0 Outpatient 41
13 1 0 0 0 0 0 ER 4
14 0 1 0 0 0 0 Outpatient 95
15 0 0 0 1 0 0 ER 7
16 0 0 0 1 0 0 ER 29
17 1 0 0 0 0 0 Outpatient 258
18 0 0 0 1 0 0 Outpatient 36
19 0 0 0 0 0 1 HomeCare 38
20 0 0 0 0 1 0 Outpatient 362
21 0 0 0 0 1 0 Outpatient 107
22 1 0 0 0 0 0 Outpatient 252
23 0 0 0 0 1 0 Hospital 14
24 0 0 0 0 1 0 Outpatient 15
25 1 0 0 0 0 0 HomeCare 76
26 0 1 0 0 0 0 HomeCare 21
27 0 0 0 0 0 1 ER 3
28 0 1 0 0 0 0 Clinic 261
29 1 0 0 0 0 0 Outpatient 187
30 0 0 0 0 0 1 Outpatient 75
;
This is the code that I am using to count the '1's for each year by each service type. How can I add to this to a code to calculate the median los by each service type and each year?
proc report data=have ; column service_type cy2014 cy2015 cy2016 cy2017 cy2018 cy2019; define service_type / group width=50 'Program Type'; define cy2014 / sum '2014'; define cy2015 / sum '2015'; define cy2016 / sum '2016'; define cy2017 / sum '2017'; define cy2018 / sum '2018'; define cy2019 / sum '2019(YTD)'; run;
Thanks in Advance!
Hi:
PROC REPORT could do it, but it will be far easier to do it in the DATA step where you read the data using ARRAY processing. I think you want something like this:
Which I accomplished with this:
data fakedata;
length service_type $10;
input ID $ cy2014 cy2015 cy2016 cy2017 cy2018 cy2019 service_type $ los;
** use array processing to create length of stay variables;
** for each year. IF "CYxxxx" variable is 1 then STAYxxxx variable;
** is assigned the LOS value for that observation;
array arrcy cy2014-cy2019;
array arrlos stay2014-stay2019;
do i = 1 to dim(arrcy);
if arrcy(i) = 1 then arrlos(i) = los;
else if arrcy(i) = 0 then arrlos(i) = 0;
end;
datalines;
1 0 0 0 0 1 0 Outpatient 313
2 0 1 0 0 0 0 Hospital 14
3 0 0 0 0 0 1 Outpatient 59
4 0 0 0 0 1 0 Outpatient 104
5 0 0 0 0 0 1 Outpatient 57
6 0 0 0 0 0 1 Outpatient 34
7 0 0 0 0 1 0 ER 6
8 0 0 1 0 0 0 ER 4
9 0 0 1 0 0 0 HomeCare 332
10 1 0 0 0 0 0 Outpatient 25
11 0 0 0 0 1 0 Clinic 624
12 0 0 1 0 0 0 Outpatient 41
13 1 0 0 0 0 0 ER 4
14 0 1 0 0 0 0 Outpatient 95
15 0 0 0 1 0 0 ER 7
16 0 0 0 1 0 0 ER 29
17 1 0 0 0 0 0 Outpatient 258
18 0 0 0 1 0 0 Outpatient 36
19 0 0 0 0 0 1 HomeCare 38
20 0 0 0 0 1 0 Outpatient 362
21 0 0 0 0 1 0 Outpatient 107
22 1 0 0 0 0 0 Outpatient 252
23 0 0 0 0 1 0 Hospital 14
24 0 0 0 0 1 0 Outpatient 15
25 1 0 0 0 0 0 HomeCare 76
26 0 1 0 0 0 0 HomeCare 21
27 0 0 0 0 0 1 ER 3
28 0 1 0 0 0 0 Clinic 261
29 1 0 0 0 0 0 Outpatient 187
30 0 0 0 0 0 1 Outpatient 75
;
run;
title; footnote;
proc report data=fakedata ;
column service_type
('2014' cy2014 stay2014) ('2015' cy2015 stay2015)
('2016' cy2016 stay2016) ('2017' cy2017 stay2017)
('2018' cy2018 stay2018) ('2019(YTD)' cy2019 stay2019) ;
define service_type / group 'Program Type';
define cy2014 / sum "CY";
define stay2014 / sum "LOS";
define cy2015 / sum "CY";
define stay2015 / sum "LOS";
define cy2016 / sum "CY";
define stay2016 / sum "LOS";
define cy2017 / sum "CY";
define stay2017 / sum "LOS";
define cy2018 / sum "CY";
define stay2018 / sum "LOS";
define cy2019 / sum "CY";
define stay2019 / sum "LOS";
rbreak after / summarize;
run;
Hope this helps,
cynthia
According to this, you can compute medians in PROC REPORT
Just replace SUM with MEDIAN in your code.
Hi,
Yes LOS (length of stay) is only showing by service type. But if you note no two years have a 1 indicator in the same row. so if there is a 1 in cy2014 you will note that none of the other years have a 1. Is it possible that in proc report a statement such as "if cy2014=1 then calculate median by service type and label the median as los for cy2014" and then "if cy2015=1 then calculate median by service type and label the median as los for cy2015" and so far?
I am trying to get to the bottom table.
LOS is already calculate I just want to try to calculate the median LOS by each service type by the year. So something like the below as the result.
Program Type | 2014 | Median LOS 2014 | 2015 | Median LOS2015 | 2016 | Median LOS 2016 | 2017 | Median LOS 2017 | 2018 | Median LOS 2018 | 2019 | Median LOS 2019 |
Clinic | 0 | 1 | 261 | 0 | 0 | 1 | 624 | 0 | ||||
ER | 1 | 4 | 0 | 1 | 4 | 2 | 18 | 1 | 6 | 1 | 3 | |
HomeCare | 1 | 76 | 1 | 21 | 1 | 332 | 0 | 0 | 1 | 38 | ||
Hospital | 0 | 1 | 14 | 0 | 0 | 1 | 14 | 0 | ||||
Outpatie | 4 | 219.5 | 1 | 95 | 1 | 41 | 1 | 36 | 5 | 107 | 4 | 58 |
Hi:
PROC REPORT could do it, but it will be far easier to do it in the DATA step where you read the data using ARRAY processing. I think you want something like this:
Which I accomplished with this:
data fakedata;
length service_type $10;
input ID $ cy2014 cy2015 cy2016 cy2017 cy2018 cy2019 service_type $ los;
** use array processing to create length of stay variables;
** for each year. IF "CYxxxx" variable is 1 then STAYxxxx variable;
** is assigned the LOS value for that observation;
array arrcy cy2014-cy2019;
array arrlos stay2014-stay2019;
do i = 1 to dim(arrcy);
if arrcy(i) = 1 then arrlos(i) = los;
else if arrcy(i) = 0 then arrlos(i) = 0;
end;
datalines;
1 0 0 0 0 1 0 Outpatient 313
2 0 1 0 0 0 0 Hospital 14
3 0 0 0 0 0 1 Outpatient 59
4 0 0 0 0 1 0 Outpatient 104
5 0 0 0 0 0 1 Outpatient 57
6 0 0 0 0 0 1 Outpatient 34
7 0 0 0 0 1 0 ER 6
8 0 0 1 0 0 0 ER 4
9 0 0 1 0 0 0 HomeCare 332
10 1 0 0 0 0 0 Outpatient 25
11 0 0 0 0 1 0 Clinic 624
12 0 0 1 0 0 0 Outpatient 41
13 1 0 0 0 0 0 ER 4
14 0 1 0 0 0 0 Outpatient 95
15 0 0 0 1 0 0 ER 7
16 0 0 0 1 0 0 ER 29
17 1 0 0 0 0 0 Outpatient 258
18 0 0 0 1 0 0 Outpatient 36
19 0 0 0 0 0 1 HomeCare 38
20 0 0 0 0 1 0 Outpatient 362
21 0 0 0 0 1 0 Outpatient 107
22 1 0 0 0 0 0 Outpatient 252
23 0 0 0 0 1 0 Hospital 14
24 0 0 0 0 1 0 Outpatient 15
25 1 0 0 0 0 0 HomeCare 76
26 0 1 0 0 0 0 HomeCare 21
27 0 0 0 0 0 1 ER 3
28 0 1 0 0 0 0 Clinic 261
29 1 0 0 0 0 0 Outpatient 187
30 0 0 0 0 0 1 Outpatient 75
;
run;
title; footnote;
proc report data=fakedata ;
column service_type
('2014' cy2014 stay2014) ('2015' cy2015 stay2015)
('2016' cy2016 stay2016) ('2017' cy2017 stay2017)
('2018' cy2018 stay2018) ('2019(YTD)' cy2019 stay2019) ;
define service_type / group 'Program Type';
define cy2014 / sum "CY";
define stay2014 / sum "LOS";
define cy2015 / sum "CY";
define stay2015 / sum "LOS";
define cy2016 / sum "CY";
define stay2016 / sum "LOS";
define cy2017 / sum "CY";
define stay2017 / sum "LOS";
define cy2018 / sum "CY";
define stay2018 / sum "LOS";
define cy2019 / sum "CY";
define stay2019 / sum "LOS";
rbreak after / summarize;
run;
Hope this helps,
cynthia
Ah! I see what you did there! Thank you!!!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.