Is it possible to calculate median in Proc report?
I have a dataset that has a client ID, years, service type and length of stay (los).
What I want to do is calculate using proc report (or maybe tabulate??) the median length of stay by service type for each year.
Is that possible or do I have to do this in proc means? or is there another way of doing this?
Asking because I am trying to use proc report to have columns that count the number of '1's under each year and then have another column that will calculate the median los by service type for each year. You will notice in my sample below that no two years have a '1' in the same row.
data have;
input ID $ cy2014 cy2015 cy2016 cy2017 cy2018 cy2019 service_type $ los;
datalines;
1	0	0	0	0	1	0	Outpatient	313
2	0	1	0	0	0	0	Hospital	14
3	0	0	0	0	0	1	Outpatient	59
4	0	0	0	0	1	0	Outpatient	104
5	0	0	0	0	0	1	Outpatient	57
6	0	0	0	0	0	1	Outpatient	34
7	0	0	0	0	1	0	ER	6
8	0	0	1	0	0	0	ER	4
9	0	0	1	0	0	0	HomeCare	332
10	1	0	0	0	0	0	Outpatient	25
11	0	0	0	0	1	0	Clinic	624
12	0	0	1	0	0	0	Outpatient	41
13	1	0	0	0	0	0	ER	4
14	0	1	0	0	0	0	Outpatient	95
15	0	0	0	1	0	0	ER	7
16	0	0	0	1	0	0	ER	29
17	1	0	0	0	0	0	Outpatient	258
18	0	0	0	1	0	0	Outpatient	36
19	0	0	0	0	0	1	HomeCare	38
20	0	0	0	0	1	0	Outpatient	362
21	0	0	0	0	1	0	Outpatient	107
22	1	0	0	0	0	0	Outpatient	252
23	0	0	0	0	1	0	Hospital	14
24	0	0	0	0	1	0	Outpatient	15
25	1	0	0	0	0	0	HomeCare	76
26	0	1	0	0	0	0	HomeCare	21
27	0	0	0	0	0	1	ER	3
28	0	1	0	0	0	0	Clinic	261
29	1	0	0	0	0	0	Outpatient	187
30	0	0	0	0	0	1	Outpatient	75
;This is the code that I am using to count the '1's for each year by each service type. How can I add to this to a code to calculate the median los by each service type and each year?
proc report data=have ; column service_type cy2014 cy2015 cy2016 cy2017 cy2018 cy2019; define service_type / group width=50 'Program Type'; define cy2014 / sum '2014'; define cy2015 / sum '2015'; define cy2016 / sum '2016'; define cy2017 / sum '2017'; define cy2018 / sum '2018'; define cy2019 / sum '2019(YTD)'; run;
Thanks in Advance!
Hi:
PROC REPORT could do it, but it will be far easier to do it in the DATA step where you read the data using ARRAY processing. I think you want something like this:
Which I accomplished with this:
data fakedata;
  length service_type $10;
  input ID $ cy2014 cy2015 cy2016 cy2017 cy2018 cy2019 service_type $ los;
  
  ** use array processing to create length of stay variables;
  ** for each year. IF "CYxxxx" variable is 1 then STAYxxxx variable;
  ** is assigned the LOS value for that observation;
  array arrcy cy2014-cy2019;
  array arrlos stay2014-stay2019;
  do i = 1 to dim(arrcy);
     if arrcy(i) = 1 then arrlos(i) = los;
	 else if arrcy(i) = 0 then arrlos(i) = 0;
  end;
datalines;
1	0	0	0	0	1	0	Outpatient	313
2	0	1	0	0	0	0	Hospital	14
3	0	0	0	0	0	1	Outpatient	59
4	0	0	0	0	1	0	Outpatient	104
5	0	0	0	0	0	1	Outpatient	57
6	0	0	0	0	0	1	Outpatient	34
7	0	0	0	0	1	0	ER	6
8	0	0	1	0	0	0	ER	4
9	0	0	1	0	0	0	HomeCare	332
10	1	0	0	0	0	0	Outpatient	25
11	0	0	0	0	1	0	Clinic	624
12	0	0	1	0	0	0	Outpatient	41
13	1	0	0	0	0	0	ER	4
14	0	1	0	0	0	0	Outpatient	95
15	0	0	0	1	0	0	ER	7
16	0	0	0	1	0	0	ER	29
17	1	0	0	0	0	0	Outpatient	258
18	0	0	0	1	0	0	Outpatient	36
19	0	0	0	0	0	1	HomeCare	38
20	0	0	0	0	1	0	Outpatient	362
21	0	0	0	0	1	0	Outpatient	107
22	1	0	0	0	0	0	Outpatient	252
23	0	0	0	0	1	0	Hospital	14
24	0	0	0	0	1	0	Outpatient	15
25	1	0	0	0	0	0	HomeCare	76
26	0	1	0	0	0	0	HomeCare	21
27	0	0	0	0	0	1	ER	3
28	0	1	0	0	0	0	Clinic	261
29	1	0	0	0	0	0	Outpatient	187
30	0	0	0	0	0	1	Outpatient	75
;
run;
  
title; footnote;
proc report data=fakedata ;
column service_type  
       ('2014' cy2014 stay2014) ('2015' cy2015 stay2015) 
       ('2016' cy2016 stay2016) ('2017' cy2017 stay2017) 
       ('2018' cy2018 stay2018) ('2019(YTD)' cy2019 stay2019) ;
define service_type / group  'Program Type';
define cy2014 / sum "CY";
define stay2014 / sum "LOS";
define cy2015 / sum "CY";
define stay2015 / sum "LOS";
define cy2016 / sum "CY";
define stay2016 / sum "LOS";
define cy2017 / sum  "CY";
define stay2017 / sum  "LOS";
define cy2018 / sum "CY";
define stay2018 / sum "LOS";
define cy2019 / sum "CY";
define stay2019 / sum "LOS";
 
rbreak after / summarize;
run;
Hope this helps,
cynthia
According to this, you can compute medians in PROC REPORT
Just replace SUM with MEDIAN in your code.
Hi,
Yes LOS (length of stay) is only showing by service type. But if you note no two years have a 1 indicator in the same row. so if there is a 1 in cy2014 you will note that none of the other years have a 1. Is it possible that in proc report a statement such as "if cy2014=1 then calculate median by service type and label the median as los for cy2014" and then "if cy2015=1 then calculate median by service type and label the median as los for cy2015" and so far?
I am trying to get to the bottom table.
LOS is already calculate I just want to try to calculate the median LOS by each service type by the year. So something like the below as the result.
| Program Type | 2014 | Median LOS 2014 | 2015 | Median LOS2015 | 2016 | Median LOS 2016 | 2017 | Median LOS 2017 | 2018 | Median LOS 2018 | 2019 | Median LOS 2019 | 
| Clinic | 0 | 1 | 261 | 0 | 0 | 1 | 624 | 0 | ||||
| ER | 1 | 4 | 0 | 1 | 4 | 2 | 18 | 1 | 6 | 1 | 3 | |
| HomeCare | 1 | 76 | 1 | 21 | 1 | 332 | 0 | 0 | 1 | 38 | ||
| Hospital | 0 | 1 | 14 | 0 | 0 | 1 | 14 | 0 | ||||
| Outpatie | 4 | 219.5 | 1 | 95 | 1 | 41 | 1 | 36 | 5 | 107 | 4 | 58 | 
Hi:
PROC REPORT could do it, but it will be far easier to do it in the DATA step where you read the data using ARRAY processing. I think you want something like this:
Which I accomplished with this:
data fakedata;
  length service_type $10;
  input ID $ cy2014 cy2015 cy2016 cy2017 cy2018 cy2019 service_type $ los;
  
  ** use array processing to create length of stay variables;
  ** for each year. IF "CYxxxx" variable is 1 then STAYxxxx variable;
  ** is assigned the LOS value for that observation;
  array arrcy cy2014-cy2019;
  array arrlos stay2014-stay2019;
  do i = 1 to dim(arrcy);
     if arrcy(i) = 1 then arrlos(i) = los;
	 else if arrcy(i) = 0 then arrlos(i) = 0;
  end;
datalines;
1	0	0	0	0	1	0	Outpatient	313
2	0	1	0	0	0	0	Hospital	14
3	0	0	0	0	0	1	Outpatient	59
4	0	0	0	0	1	0	Outpatient	104
5	0	0	0	0	0	1	Outpatient	57
6	0	0	0	0	0	1	Outpatient	34
7	0	0	0	0	1	0	ER	6
8	0	0	1	0	0	0	ER	4
9	0	0	1	0	0	0	HomeCare	332
10	1	0	0	0	0	0	Outpatient	25
11	0	0	0	0	1	0	Clinic	624
12	0	0	1	0	0	0	Outpatient	41
13	1	0	0	0	0	0	ER	4
14	0	1	0	0	0	0	Outpatient	95
15	0	0	0	1	0	0	ER	7
16	0	0	0	1	0	0	ER	29
17	1	0	0	0	0	0	Outpatient	258
18	0	0	0	1	0	0	Outpatient	36
19	0	0	0	0	0	1	HomeCare	38
20	0	0	0	0	1	0	Outpatient	362
21	0	0	0	0	1	0	Outpatient	107
22	1	0	0	0	0	0	Outpatient	252
23	0	0	0	0	1	0	Hospital	14
24	0	0	0	0	1	0	Outpatient	15
25	1	0	0	0	0	0	HomeCare	76
26	0	1	0	0	0	0	HomeCare	21
27	0	0	0	0	0	1	ER	3
28	0	1	0	0	0	0	Clinic	261
29	1	0	0	0	0	0	Outpatient	187
30	0	0	0	0	0	1	Outpatient	75
;
run;
  
title; footnote;
proc report data=fakedata ;
column service_type  
       ('2014' cy2014 stay2014) ('2015' cy2015 stay2015) 
       ('2016' cy2016 stay2016) ('2017' cy2017 stay2017) 
       ('2018' cy2018 stay2018) ('2019(YTD)' cy2019 stay2019) ;
define service_type / group  'Program Type';
define cy2014 / sum "CY";
define stay2014 / sum "LOS";
define cy2015 / sum "CY";
define stay2015 / sum "LOS";
define cy2016 / sum "CY";
define stay2016 / sum "LOS";
define cy2017 / sum  "CY";
define stay2017 / sum  "LOS";
define cy2018 / sum "CY";
define stay2018 / sum "LOS";
define cy2019 / sum "CY";
define stay2019 / sum "LOS";
 
rbreak after / summarize;
run;
Hope this helps,
cynthia
Ah! I see what you did there! Thank you!!!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.