I have a proc tabulate that gives me the sum I want but I would like to add a percent column for each clinic. I only want the percentage of clients for gender and Hispanic/non-Hispanic, for each clinic. To clarify, the denominator for these percentages is the total number of clients for that clinic. I would also really like to have the dollar and percent formats of the bottom example if it's possible. I have the formats defined in another datastep but the preloadfmt is not using those defined formats. I have tried several different things and am unable to get the results I want.
This is my code followed by my result.
proc tabulate data=Total2 format=8. missing ;
class Clinic blankrow /preloadfmt order=data;
var Total__Days Total_count Total_Stays Total_cost avg__Days avg_cost min_age max_age median_age M_count F_count yes_count No_count ;
keylabel n=' ' sum=' ' mean=' ' std=' ' pctn=' ' pctsum=' ' ;
tables Total_count='Clients'
blankrow=' '*[s=[foreground=white]]
M_count="Males"
F_Count="Females"
blankrow=''*[s=[foreground=white]]
yes_count='Hispanic'
No_count='Non-Hispanic'
blankrow=''*[s=[foreground=white]]
min_age='Min Age' max_age='Max Age' median_age='Median Age'
blankrow=''*[s=[foreground=white]]
Total__Days='Total Days'
Total_Stays='Numberof Stays'
blankrow=' '*[s=[foreground=white]]
Total_cost='Cost'
avg__Days='Average Days'
avg_cost='Average Cost Per Client'
, Clinic="" all='Totals' / rts=50 row=float misstext=' ' ;
run;
This is what I get
This is what I want:
/*
You are creating a very complicated report,
that is not suited for PROC REPORT or PROC TABULATE.
I would use PROC SQL for it.
*/
Data Have;
input Clinic$ Gender$ ID$ Distinct_Count Cost Hispanic$ SFY$ Clinic_Type$ Stays Days Age;
datalines;
A M 1 1 457 NO 2023 1 1 62 16
A M 2 1 2739 NO 2023 1 1 340 17
A M 3 1 3652 NO 2023 1 1 7 12
A M 4 1 5479 YE 2023 1 1 164 15
A M 5 1 3196 NO 2023 1 1 37 14
A M 6 1 6392 NO 2023 1 1 253 13
A M 7 1 9588 NO 2023 1 1 51 17
A M 8 1 8218 NO 2023 1 1 140 16
A M 9 1 10501 NO 2023 1 1 116 15
B F 10 1 10957 NO 2023 1 1 115 12
B M 11 1 1826 NO 2023 1 1 156 13
B F 12 1 10501 NO 2023 1 1 299 14
B F 13 1 5935 NO 2023 1 1 19 15
B F 14 1 8675 NO 2023 1 1 86 17
B M 15 1 3196 NO 2023 1 1 179 16
B F 16 1 6848 YE 2023 1 1 251 17
B M 17 1 9131 NO 2023 1 1 112 17
B F 18 1 11871 YE 2023 1 1 85 17
B F 19 1 457 NO 2023 1 1 153 17
B F 20 1 3196 NO 2023 1 1 37 17
;
run;
proc format;
value $gender
'M'=' Male'
'F'='Female'
;
value $eth
'YE'=' Hispanic'
'NO'='Non-Hispanic'
;
run;
proc sql;
create table report as
select 1 as id1,'GENDER' as a length=40,Clinic,
1 as id2,put(Gender,$gender.) as b length=40,
1 as id3,count(*) as c format=comma20.,
1 as id4,count(*)/(select count(*) from have where Clinic=a.Clinic) as d format=percent8.2,
1 as id5,sum(cost) as e format=dollar20.,
1 as id6,sum(cost)/(select sum(cost) from have where Clinic=a.Clinic) as f format=percent8.2
from have as a
group by Clinic,Gender
union all
select 2 as id1,'ETHNICITY' as a,Clinic,
2 as id2,put(Hispanic,$eth.) as b,
2 as id3,count(*) as c,
2 as id4,count(*)/(select count(*) from have where Clinic=a.Clinic) as d format=percent8.2,
2 as id5,sum(cost) as e format=dollar20.,
2 as id6,sum(cost)/(select sum(cost) from have where Clinic=a.Clinic) as f format=percent8.2
from have as a
group by Clinic,Hispanic
union all
select 3 as id1,'AGE' as a,Clinic,
3 as id2,' Min Age' as b,
3 as id3,min(age) as c,
3 as id4,.,
3 as id5,.,
3 as id6,.
from have
group by Clinic
union all
select 3 as id1,'AGE' as a,Clinic,
3 as id2,' Max Age' as b,
3 as id3,max(age) as c,
3 as id4,.,
3 as id5,.,
3 as id6,.
from have
group by Clinic
union all
select 3 as id1,'AGE' as a,Clinic,
3 as id2,'Avg Age' as b,
3 as id3,mean(age) as c,
3 as id4,.,
3 as id5,.,
3 as id6,.
from have
group by Clinic
union all
select 6 as id1,'09'x as a,Clinic,
6 as id2,' Avg cost per client' as b,
6 as id3,. as c,
6 as id4,.,
6 as id5,mean(cost),
6 as id6,.
from have
group by Clinic
union all
select 6 as id1,'09'x as a,Clinic,
6 as id2,' Avg Days per client' as b,
6 as id3,mean(days) as c,
6 as id4,.,
6 as id5,.,
6 as id6,.
from have
group by Clinic
union all
select 6 as id1,'09'x as a,Clinic,
6 as id2,' Total Stays' as b,
6 as id3,sum(stays) as c,
6 as id4,.,
6 as id5,.,
6 as id6,.
from have
group by Clinic
union all
select 6 as id1,'09'x as a,Clinic,
6 as id2,' Total Days' as b,
6 as id3,sum(days) as c,
6 as id4,.,
6 as id5,.,
6 as id6,.
from have
group by Clinic
union all
select 6 as id1,'09'x as a,Clinic,
6 as id2,'Total' as b,
6 as id3,count(*) as c,
6 as id4,.,
6 as id5,.,
6 as id6,.
from have
group by Clinic
;quit;
options missing=' ';
proc report data=report nowd;
column id1 a b clinic,(c d e f);
define id1/group noprint;
define a/group noprint ' ';
define b/group ' ';
define c/analysis 'Clients';
define d/analysis '% of Clients';
define e/analysis 'Cost';
define f/analysis '% of Cost';
define clinic/across '';
compute before a/style={just=l color=black};
if id1=1 then len=0;
else len=10;
x=' ';
line x $varying10. len;
line a $40.;
endcomp;
run;
I STRONGLY suggest that you provide example data. Preferably in the form of working data step code.
The way you have your code with apparently separate values for Male and Female (and indeed each row) means that the tools Proc Tabulate uses to do percentages are likely not going to work at all.
Typically I would expect to see a variable like SEX with values of Male and Female, then a percentage of the count for the variable SEX can be done.
You have included the option PRELOADFMT. So we would need to know the definition of the format (Proc Format code) and which variable(s) are using which format(s).
You have also cutout the Totals column and don't show what percentages should show there.
The way tabulate builds tables when you cross dimensions then the dimension result has to be valid for all the rows in a single table. So your choice is going to involve single table or choice of report procedure.
The ODS destination may also have an impact in creating desired appearance.
After reading the responses I have decided that I need to use proc report as proc tabulate will not give me all the statistics I want. I hope you don't mind, I am changing a few things from my original post on this. Please let me know if you suggest starting a new discussion rather than continuing in this discussion. My current Proc Report code is creating columns where I want rows. I think I might need to summarize or reconstruct my data in order to get the results I want but I'm not sure how to go about that. I also wanted to add in percent of clients (percent of total clients for that particular clinic) and percent of cost columns (percent of total cost for that particular clinic). Below is my sample data, proc report and a new example of what I would like my results to look like.
Data Have;
input
Clinic$ Gender$ ID$ Distinct_Count Cost Hispanic$ SFY$ Clinic_Type$ Stays Days Age;
datalines;
A M 1 1 457 NO 2023 1 1 62 16
A M 2 1 2739 NO 2023 1 1 340 17
A M 3 1 3652 NO 2023 1 1 7 12
A M 4 1 5479 YE 2023 1 1 164 15
A M 5 1 3196 NO 2023 1 1 37 14
A M 6 1 6392 NO 2023 1 1 253 13
A M 7 1 9588 NO 2023 1 1 51 17
A M 8 1 8218 NO 2023 1 1 140 16
A M 9 1 10501 NO 2023 1 1 116 15
B F 10 1 10957 NO 2023 1 1 115 12
B M 11 1 1826 NO 2023 1 1 156 13
B F 12 1 10501 NO 2023 1 1 299 14
B F 13 1 5935 NO 2023 1 1 19 15
B F 14 1 8675 NO 2023 1 1 86 17
B M 15 1 3196 NO 2023 1 1 179 16
B F 16 1 6848 YE 2023 1 1 251 17
B M 17 1 9131 NO 2023 1 1 112 17
B F 18 1 11871 YE 2023 1 1 85 17
B F 19 1 457 NO 2023 1 1 153 17
B F 20 1 3196 NO 2023 1 1 37 17
;
run;
proc report data=have;
columns Age Gender Hispanic Clinic, (Distinct_Count Cost) ;
define Cost / analysis sum "Expendeture" format=dollar12.;
define Distinct_Count / analysis sum "Clients" format=comma10.;
define Gender / group "Gender" style=header;
define Hispanic / group "Ethnic" style=header;
define Clinic / across "Table" ;
define Age / min max mean style=header;
run;
/*
You are creating a very complicated report,
that is not suited for PROC REPORT or PROC TABULATE.
I would use PROC SQL for it.
*/
Data Have;
input Clinic$ Gender$ ID$ Distinct_Count Cost Hispanic$ SFY$ Clinic_Type$ Stays Days Age;
datalines;
A M 1 1 457 NO 2023 1 1 62 16
A M 2 1 2739 NO 2023 1 1 340 17
A M 3 1 3652 NO 2023 1 1 7 12
A M 4 1 5479 YE 2023 1 1 164 15
A M 5 1 3196 NO 2023 1 1 37 14
A M 6 1 6392 NO 2023 1 1 253 13
A M 7 1 9588 NO 2023 1 1 51 17
A M 8 1 8218 NO 2023 1 1 140 16
A M 9 1 10501 NO 2023 1 1 116 15
B F 10 1 10957 NO 2023 1 1 115 12
B M 11 1 1826 NO 2023 1 1 156 13
B F 12 1 10501 NO 2023 1 1 299 14
B F 13 1 5935 NO 2023 1 1 19 15
B F 14 1 8675 NO 2023 1 1 86 17
B M 15 1 3196 NO 2023 1 1 179 16
B F 16 1 6848 YE 2023 1 1 251 17
B M 17 1 9131 NO 2023 1 1 112 17
B F 18 1 11871 YE 2023 1 1 85 17
B F 19 1 457 NO 2023 1 1 153 17
B F 20 1 3196 NO 2023 1 1 37 17
;
run;
proc format;
value $gender
'M'=' Male'
'F'='Female'
;
value $eth
'YE'=' Hispanic'
'NO'='Non-Hispanic'
;
run;
proc sql;
create table report as
select 1 as id1,'GENDER' as a length=40,Clinic,
1 as id2,put(Gender,$gender.) as b length=40,
1 as id3,count(*) as c format=comma20.,
1 as id4,count(*)/(select count(*) from have where Clinic=a.Clinic) as d format=percent8.2,
1 as id5,sum(cost) as e format=dollar20.,
1 as id6,sum(cost)/(select sum(cost) from have where Clinic=a.Clinic) as f format=percent8.2
from have as a
group by Clinic,Gender
union all
select 2 as id1,'ETHNICITY' as a,Clinic,
2 as id2,put(Hispanic,$eth.) as b,
2 as id3,count(*) as c,
2 as id4,count(*)/(select count(*) from have where Clinic=a.Clinic) as d format=percent8.2,
2 as id5,sum(cost) as e format=dollar20.,
2 as id6,sum(cost)/(select sum(cost) from have where Clinic=a.Clinic) as f format=percent8.2
from have as a
group by Clinic,Hispanic
union all
select 3 as id1,'AGE' as a,Clinic,
3 as id2,' Min Age' as b,
3 as id3,min(age) as c,
3 as id4,.,
3 as id5,.,
3 as id6,.
from have
group by Clinic
union all
select 3 as id1,'AGE' as a,Clinic,
3 as id2,' Max Age' as b,
3 as id3,max(age) as c,
3 as id4,.,
3 as id5,.,
3 as id6,.
from have
group by Clinic
union all
select 3 as id1,'AGE' as a,Clinic,
3 as id2,'Avg Age' as b,
3 as id3,mean(age) as c,
3 as id4,.,
3 as id5,.,
3 as id6,.
from have
group by Clinic
union all
select 6 as id1,'09'x as a,Clinic,
6 as id2,' Avg cost per client' as b,
6 as id3,. as c,
6 as id4,.,
6 as id5,mean(cost),
6 as id6,.
from have
group by Clinic
union all
select 6 as id1,'09'x as a,Clinic,
6 as id2,' Avg Days per client' as b,
6 as id3,mean(days) as c,
6 as id4,.,
6 as id5,.,
6 as id6,.
from have
group by Clinic
union all
select 6 as id1,'09'x as a,Clinic,
6 as id2,' Total Stays' as b,
6 as id3,sum(stays) as c,
6 as id4,.,
6 as id5,.,
6 as id6,.
from have
group by Clinic
union all
select 6 as id1,'09'x as a,Clinic,
6 as id2,' Total Days' as b,
6 as id3,sum(days) as c,
6 as id4,.,
6 as id5,.,
6 as id6,.
from have
group by Clinic
union all
select 6 as id1,'09'x as a,Clinic,
6 as id2,'Total' as b,
6 as id3,count(*) as c,
6 as id4,.,
6 as id5,.,
6 as id6,.
from have
group by Clinic
;quit;
options missing=' ';
proc report data=report nowd;
column id1 a b clinic,(c d e f);
define id1/group noprint;
define a/group noprint ' ';
define b/group ' ';
define c/analysis 'Clients';
define d/analysis '% of Clients';
define e/analysis 'Cost';
define f/analysis '% of Cost';
define clinic/across '';
compute before a/style={just=l color=black};
if id1=1 then len=0;
else len=10;
x=' ';
line x $varying10. len;
line a $40.;
endcomp;
run;
@Whitlea wrote:
Thank you! Wow this looks great... and as you said, complicated! Some of these clients have multiple stays I have some duplicates when I apply this to my actual data. I attempted to replace the count(*) with a count (distinct) but that did not fix the duplicates. How would I go about replacing the counts of rows to counting the distinct IDs or counting my distinct_count variable?
Thank you!!
Ideally de-dup your data before you feed it to @Ksharp 's code. If that's not possible then I suggest you provide representative sample data with such duplicates and then show us the issue and what you would need instead.
Different people call different things "duplicates" which is another reason why providing such representative sample data will help to avoid misunderstandings.
Here is a sample that includes duplicates with ID's 8 and 12 being the duplicate clients. A client can have multiple stays in the same or different clinic. I want distinct count of clients for each clinic. So If a client has more than one stay at the same clinic, I only want them counted once for that clinic. But I would like the Cost to include the cost of all stays regardless of duplicates and I would like the total spans to include all stays including duplicates. I was able to achieve unduplicated clients with adding a where span=1 statement but unfortunately that filtered the cost down to those single stays which is not what I wanted.
Data Have;
input
Clinic$ Gender$ ID$ Distinct_Count Cost Hispanic$ SFY$ Clinic_Type$ Stays Days Age;
datalines;
A M 1 1 457 NO 2023 1 1 62 16
A M 2 1 2739 NO 2023 1 1 340 17
A M 3 1 3652 NO 2023 1 1 7 12
A M 4 1 5479 YE 2023 1 1 164 15
A M 5 1 3196 NO 2023 1 1 37 14
A M 6 1 6392 NO 2023 1 1 253 13
A M 7 1 9588 NO 2023 1 1 51 17
A M 8 1 821 NO 2023 1 1 40 15
A M 8 0 105 NO 2023 1 2 116 15
B F 10 1 10957 NO 2023 1 1 115 12
B M 11 1 1826 NO 2023 1 1 156 13
B F 12 1 10501 NO 2023 1 1 299 14
B F 12 0 5935 NO 2023 1 2 19 14
B F 12 0 8675 NO 2023 1 3 86 14
B M 15 1 3196 NO 2023 1 1 179 16
B F 16 1 6848 YE 2023 1 1 251 17
B M 17 1 9131 NO 2023 1 1 112 17
B F 18 1 11871 YE 2023 1 1 85 17
B F 19 1 457 NO 2023 1 1 153 17
B F 20 1 3196 NO 2023 1 1 37 17
;
run;
But you did not post the output you want yet ?
So you just want to count distinct ID ? and keep duplicated COST STAYS ?
And I don't see any variable named "span=1".
Data Have;
input
Clinic$ Gender$ ID$ Distinct_Count Cost Hispanic$ SFY$ Clinic_Type$ Stays Days Age;
datalines;
A M 1 1 457 NO 2023 1 1 62 16
A M 2 1 2739 NO 2023 1 1 340 17
A M 3 1 3652 NO 2023 1 1 7 12
A M 4 1 5479 YE 2023 1 1 164 15
A M 5 1 3196 NO 2023 1 1 37 14
A M 6 1 6392 NO 2023 1 1 253 13
A M 7 1 9588 NO 2023 1 1 51 17
A M 8 1 821 NO 2023 1 1 40 15
A M 8 0 105 NO 2023 1 2 116 15
B F 10 1 10957 NO 2023 1 1 115 12
B M 11 1 1826 NO 2023 1 1 156 13
B F 12 1 10501 NO 2023 1 1 299 14
B F 12 0 5935 NO 2023 1 2 19 14
B F 12 0 8675 NO 2023 1 3 86 14
B M 15 1 3196 NO 2023 1 1 179 16
B F 16 1 6848 YE 2023 1 1 251 17
B M 17 1 9131 NO 2023 1 1 112 17
B F 18 1 11871 YE 2023 1 1 85 17
B F 19 1 457 NO 2023 1 1 153 17
B F 20 1 3196 NO 2023 1 1 37 17
;
run;
proc format;
value $gender
'M'=' Male'
'F'='Female'
;
value $eth
'YE'=' Hispanic'
'NO'='Non-Hispanic'
;
run;
proc sql;
create table report as
select 1 as id1,'GENDER' as a length=40,Clinic,
1 as id2,put(Gender,$gender.) as b length=40,
1 as id3,count(distinct id) as c format=comma20.,
1 as id4,count(distinct id)/(select count(distinct id) from have where Clinic=a.Clinic) as d format=percent8.2,
1 as id5,sum(cost) as e format=dollar20.,
1 as id6,sum(cost)/(select sum(cost) from have where Clinic=a.Clinic) as f format=percent8.2
from have as a
group by Clinic,Gender
union all
select 2 as id1,'ETHNICITY' as a,Clinic,
2 as id2,put(Hispanic,$eth.) as b,
2 as id3,count(distinct id) as c,
2 as id4,count(distinct id)/(select count(distinct id) from have where Clinic=a.Clinic) as d format=percent8.2,
2 as id5,sum(cost) as e format=dollar20.,
2 as id6,sum(cost)/(select sum(cost) from have where Clinic=a.Clinic) as f format=percent8.2
from have as a
group by Clinic,Hispanic
union all
select 3 as id1,'AGE' as a,Clinic,
3 as id2,' Min Age' as b,
3 as id3,min(age) as c,
3 as id4,.,
3 as id5,.,
3 as id6,.
from have
group by Clinic
union all
select 3 as id1,'AGE' as a,Clinic,
3 as id2,' Max Age' as b,
3 as id3,max(age) as c,
3 as id4,.,
3 as id5,.,
3 as id6,.
from have
group by Clinic
union all
select 3 as id1,'AGE' as a,Clinic,
3 as id2,'Avg Age' as b,
3 as id3,mean(age) as c,
3 as id4,.,
3 as id5,.,
3 as id6,.
from (select distinct Clinic,id,age from have )
group by Clinic
union all
select 6 as id1,'09'x as a,Clinic,
6 as id2,' Avg cost per client' as b,
6 as id3,. as c,
6 as id4,.,
6 as id5,mean(cost),
6 as id6,.
from (select Clinic,id,sum(cost) as cost from have group by Clinic,id)
group by Clinic
union all
select 6 as id1,'09'x as a,Clinic,
6 as id2,' Avg Days per client' as b,
6 as id3,mean(days) as c,
6 as id4,.,
6 as id5,.,
6 as id6,.
from (select Clinic,id,sum(days) as days from have group by Clinic,id)
group by Clinic
union all
select 6 as id1,'09'x as a,Clinic,
6 as id2,' Total Stays' as b,
6 as id3,sum(stays) as c,
6 as id4,.,
6 as id5,.,
6 as id6,.
from have
group by Clinic
union all
select 6 as id1,'09'x as a,Clinic,
6 as id2,' Total Days' as b,
6 as id3,sum(days) as c,
6 as id4,.,
6 as id5,.,
6 as id6,.
from have
group by Clinic
union all
select 6 as id1,'09'x as a,Clinic,
6 as id2,'Total' as b,
6 as id3,count(distinct id) as c,
6 as id4,.,
6 as id5,.,
6 as id6,.
from have
group by Clinic
;quit;
options missing=' ';
proc report data=report nowd;
column id1 a b clinic,(c d e f);
define id1/group noprint;
define a/group noprint ' ';
define b/group ' ';
define c/analysis 'Clients';
define d/analysis '% of Clients';
define e/analysis 'Cost';
define f/analysis '% of Cost';
define clinic/across '';
compute before a/style={just=l color=black};
if id1=1 then len=0;
else len=10;
x=' ';
line x $varying10. len;
line a $40.;
endcomp;
run;
Of course.you can.
proc sql;
create table report as
select 1 as id1,'GENDER' as a length=40,Clinic,
1 as id2,put(Gender,$gender.) as b length=40,
1 as id3,(select count(distinct id) from have where Clinic=a.Clinic and Gender=a.Gender and cost>0) as c format=comma20.,
1 as id4,calculated c/(select count(distinct id) from have where Clinic=a.Clinic and cost>0) as d format=percent8.2,
1 as id5,sum(cost) as e format=dollar20.,
1 as id6,sum(cost)/(select sum(cost) from have where Clinic=a.Clinic) as f format=percent8.2
from have as a
group by Clinic,Gender
union all
select 2 as id1,'ETHNICITY' as a,Clinic,
2 as id2,put(Hispanic,$eth.) as b,
2 as id3,(select count(distinct id) from have where Clinic=a.Clinic and Hispanic=a.Hispanic and cost>0) as c,
2 as id4,calculated c/(select count(distinct id) from have where Clinic=a.Clinic and cost>0) as d format=percent8.2,
2 as id5,sum(cost) as e format=dollar20.,
2 as id6,sum(cost)/(select sum(cost) from have where Clinic=a.Clinic) as f format=percent8.2
from have as a
group by Clinic,Hispanic
;
quit;
You effectively want to convert two columns into four columns. In PROC TABULATE this means you have to have a single expression for each new column (or possible a single expression, crossed by CLINIC, for both new columns.
The problem is that tabulate will apply that expression to every row - but that's not what you want. You want it for only the first 3 non-empty rows.
I think you have to examine PROC REPORT. Those with experience with proc report can better assist in developing the needed code.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.