Hi everyone,
I explain what I want to do using the simplified table below. I would like to total the number of patients for each age group across all sites for each category of region. Taking table B from table A as an example. Any guidance on how to do it would be greatly appreciated. Thank you.
A
A: | |||
SiteID | Age_group | Region | Num_Patients |
Site01 | 18_25 | NA | 500 |
Site01 | 18_25 | MW | 703 |
Site01 | 18_25 | Miss | 500 |
Site01 | 18_25 | NE | 507 |
Site01 | 18_25 | Other | 500 |
Site01 | 18_25 | S | 1239 |
Site01 | 18_25 | W | 541 |
Site01 | 26_44 | NA | 502 |
Site01 | 26_44 | MW | 1087 |
Site01 | 26_44 | Miss | 500 |
Site01 | 26_44 | NE | 604 |
Site01 | 26_44 | Other | 500 |
Site01 | 26_44 | S | 2763 |
Site01 | 26_44 | W | 685 |
Site02 | 18_25 | NA | 533 |
Site02 | 18_25 | MW | 4278 |
Site02 | 18_25 | Miss | 500 |
Site02 | 18_25 | NE | 1964 |
Site02 | 18_25 | Other | 500 |
Site02 | 18_25 | S | 7340 |
Site02 | 18_25 | W | 3242 |
Site02 | 26_44 | NA | 566 |
Site02 | 26_44 | MW | 5920 |
Site02 | 26_44 | Miss | 500 |
Site02 | 26_44 | NE | 2512 |
Site02 | 26_44 | Other | 501 |
Site02 | 26_44 | S | 11502 |
Site02 | 26_44 | W | 5065 |
B
B: | |||
SiteID | Age_group | Region | Num_Patients |
ALL | 18_25 | NA | 1033 |
ALL | 18_25 | MW | 4981 |
ALL | 18_25 | Miss | 1000 |
ALL | 18_25 | NE | 2471 |
ALL | 18_25 | Other | 1000 |
ALL | 18_25 | S | 8579 |
ALL | 18_25 | W | 3783 |
ALL | 26_44 | NA | 1068 |
ALL | 26_44 | MW | 7007 |
ALL | 26_44 | Miss | 1000 |
ALL | 26_44 | NE | 3116 |
ALL | 26_44 | Other | 1001 |
ALL | 26_44 | S | 14265 |
ALL | 26_44 | W | 5750 |
data have;
infile datalines dlm=',';
input SiteID $ Age_group $ Region $ Num_Patients;
datalines;
Site01,18_25,NA,500
Site01,18_25,MW,703
Site01,18_25,Miss,500
Site01,18_25,NE,507
Site01,18_25,Other,500
Site01,18_25,S,1239
Site01,18_25,W,541
Site01,26_44,NA,502
Site01,26_44,MW,1087
Site01,26_44,Miss,500
Site01,26_44,NE,604
Site01,26_44,Other,500
Site01,26_44,S,2763
Site01,26_44,W,685
Site02,18_25,NA,533
Site02,18_25,MW,4278
Site02,18_25,Miss,500
Site02,18_25,NE,1964
Site02,18_25,Other,500
Site02,18_25,S,7340
Site02,18_25,W,3242
Site02,26_44,NA,566
Site02,26_44,MW,5920
Site02,26_44,Miss,500
Site02,26_44,NE,2512
Site02,26_44,Other,501
Site02,26_44,S,11502
Site02,26_44,W,5065
;
run;
proc sql;
create table want as
select "ALL" as siteid
,age_group
,region
,sum(Num_Patients) as Num_Patients
from have
group by age_group,region
order by age_group;
quit;
Can you try this?
Try PROC SUMMARY, PROC FREQ, PROC REPORT and PROC TABULATE. All of those should be able to make this report.
An example of a report with the desired elements though different appearance using Proc Tabulate:
proc tabulate data=have out=tabout; class SiteID Age_group Region ; var Num_Patients; table (All='All Sites' SiteId)*age_group*region, Num_patients*sum='' ; run;
Tabulate doesn't repeat row or column heading that are groups of related values.
The ALL instruction is the way to get a group total across other values of the variable. I tend to use something a bit more than just "All" for labels so I can tell all of what...
Note that proc tabulate can produce multiple tables in one procedure call such as:
proc tabulate data=have ; class SiteID Age_group Region ; var Num_Patients; table (All='All Sites' SiteId)*age_group*region, Num_patients*sum='' ; table (All='All Ages' age_group )*SiteId*region, Num_patients*sum='' ; run;
which is one reason I'm a bit more verbose with the ALL label.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.