I'm working in SAS 9.4, and the following code is exactly what I'm trying to do, with the exception of the rate calculation because proc tabulate doesn't support mathematical operations (i.e., it runs without the inclusion of rate=num_deaths/pop*100000=''*f=8.1 which is not kosher code). My input dataset contains all combinations of demographic categories and their associated value for num_deaths and population. I need columns for years, under which would each have num_deaths, population, colpctn, and rate (num_deaths/pop*100000). The rows would contain each level of the demographic variables age, sex, and race_eth.
proc tabulate data=death_summary;
class year sex race_eth age;
var num_deaths pop rate;
table sex race_eth='Race/Ethnicity' age,
year=''*(num_deaths='Count'*sum=''*f=8.
num_deaths='%'*colpctsum=''*f=8.1
pop='Population'*sum=''*f=8.
rate=num_deaths/pop*100000=''*f=8.1) / misstext='0' nocellmerge;
run;
I have tried proc report, but I'm failing at nesting columns, and depending on how I've structured it, it seems to prefer to summarize observations rather than num_deaths and population. I cannot calculate rate in death_summary, because I need the calculation to be dynamic after the proc tabulate groupings. I also intend to, once this runs, to add a "where" statement to filter by state, so that I can have a macro that runs it for each of the 50 US states.
Does anyone have any suggestions on how I could calculate rates while maintaining the groupings both by column years and row demographics?
data death_summary;
input year:$4. state_name:$21. SEX:$6. race_eth:$50. age:$2. num_deaths:32. pop:32.
cause1:32. cause2:32. cause3:32. cause4:32. cause5:32. cause6:32. cause7:32. cause8:32.;
datalines;
2019 PA FEMALE A 01 2 10 2 0 0 0 0 0 0 0
2019 PA FEMALE A 02 4 9 4 0 0 0 0 0 0 0
2019 PA FEMALE A 03 0 11 0 0 0 0 0 0 0 0
2019 PA FEMALE A 04 2 6 2 0 0 0 0 0 0 0
2019 PA FEMALE A 05 0 7 0 0 0 0 0 0 0 0
2019 PA FEMALE B 01 5 5 4 0 0 0 0 0 0 0
2019 PA FEMALE B 02 11 8 0 0 11 0 0 0 0 0
2019 PA FEMALE B 03 6 12 10 0 0 0 0 6 0 0
2019 PA FEMALE B 04 2 33 1 0 0 0 0 1 0 0
2019 PA FEMALE B 05 0 5 0 0 0 0 0 0 0 0
2019 PA FEMALE W 01 2 45 2 0 0 0 0 0 0 0
2019 PA FEMALE W 02 4 6 0 0 0 0 3 0 0 0
2019 PA FEMALE W 03 6 9 0 0 0 0 6 0 0 0
2019 PA FEMALE W 04 7 12 0 0 0 0 0 7 0 0
2019 PA FEMALE W 05 1 22 1 0 0 0 0 0 0 0
2019 PA MALE A 01 6 4 0 0 0 0 6 0 0 0
2019 PA MALE A 02 10 14 6 0 0 5 0 0 0 0
2019 PA MALE A 03 7 11 0 0 0 0 7 0 0 0
2019 PA MALE A 04 4 5 0 0 0 3 0 0 0 1
2019 PA MALE A 05 0 6 0 0 0 0 0 0 0 0
2019 PA MALE B 01 10 5 4 0 0 0 0 0 0 0
2019 PA MALE B 02 16 9 0 0 1 10 0 0 0 0
2019 PA MALE B 03 12 12 7 0 0 0 0 6 0 8
2019 PA MALE B 04 1 16 0 0 0 0 0 1 0 0
2019 PA MALE B 05 3 8 0 0 0 3 0 0 0 0
2019 PA MALE W 01 0 7 0 0 0 0 0 0 0 0
2019 PA MALE W 02 5 12 0 0 0 0 1 0 0 0
2019 PA MALE W 03 0 5 0 0 0 0 0 0 0 0
2019 PA MALE W 04 0 44 0 10 0 0 0 0 0 0
2019 PA MALE W 05 9 23 9 0 0 0 0 0 0 0
;run;
proc sql;
create table report as
select 1 as id1,'SEX' as row1 length=80,1 as id2,sex as row2 length=80,year,
sum(num_deaths) as Count,
100*calculated Count/(select sum(num_deaths) from death_summary where sex is not missing and year=a.year) as per label='%' format=8.1,
sum(pop) as Population,
100*calculated Count/calculated Population as rate label='rate(%)' format=8.1
from death_summary as a
where sex is not missing and year is not missing
group by sex,year
union all
select 2,'Race/Ethnicity',2,race_eth,year,
sum(num_deaths) as Count,
100*calculated Count/(select sum(num_deaths) from death_summary where race_eth is not missing and year=a.year) as per label='%' format=8.1,
sum(pop) as Population,
100*calculated Count/calculated Population as rate label='rate(%)' format=8.1
from death_summary as a
where race_eth is not missing and year is not missing
group by race_eth,year
union all
select 3,'Age',3,age,year,
sum(num_deaths) as Count,
100*calculated Count/(select sum(num_deaths) from death_summary where age is not missing and year=a.year) as per label='%' format=8.1,
sum(pop) as Population,
100*calculated Count/calculated Population as rate label='rate(%)' format=8.1
from death_summary as a
where age is not missing and year is not missing
group by age,year
;
quit;
proc report data=report nowd;
columns id1 row1 id2 row2 year,(count per population rate);
define id1/group noprint;
define row1/group noprint;
define id2/group noprint;
define row2/group '';
define year/across '';
define count/analysis sum;
define per/analysis sum;
define population/analysis sum;
define rate/analysis sum;
compute before row1/style={just=l};
line row1 $80.;
endcomp;
run;
Show us a portion of your data. (If the data is confidential, change it so it is fake data).
Please provide the data as working SAS data step code (instructions and examples). Do not provide the data as screen captures or as Excel.
data death_summary;
input year:$4. state_name:$21. SEX:$6. race_eth:$50. age:$2. num_deaths:32. pop:32.
cause1:32. cause2:32. cause3:32. cause4:32. cause5:32. cause6:32. cause7:32. cause8:32.;
datalines;
2019 PA FEMALE A 01 2 10 2 0 0 0 0 0 0 0
2019 PA FEMALE A 02 4 9 4 0 0 0 0 0 0 0
2019 PA FEMALE A 03 0 11 0 0 0 0 0 0 0 0
2019 PA FEMALE A 04 2 6 2 0 0 0 0 0 0 0
2019 PA FEMALE A 05 0 7 0 0 0 0 0 0 0 0
2019 PA FEMALE B 01 5 5 4 0 0 0 0 0 0 0
2019 PA FEMALE B 02 11 8 0 0 11 0 0 0 0 0
2019 PA FEMALE B 03 6 12 10 0 0 0 0 6 0 0
2019 PA FEMALE B 04 2 33 1 0 0 0 0 1 0 0
2019 PA FEMALE B 05 0 5 0 0 0 0 0 0 0 0
2019 PA FEMALE W 01 2 45 2 0 0 0 0 0 0 0
2019 PA FEMALE W 02 4 6 0 0 0 0 3 0 0 0
2019 PA FEMALE W 03 6 9 0 0 0 0 6 0 0 0
2019 PA FEMALE W 04 7 12 0 0 0 0 0 7 0 0
2019 PA FEMALE W 05 1 22 1 0 0 0 0 0 0 0
2019 PA MALE A 01 6 4 0 0 0 0 6 0 0 0
2019 PA MALE A 02 10 14 6 0 0 5 0 0 0 0
2019 PA MALE A 03 7 11 0 0 0 0 7 0 0 0
2019 PA MALE A 04 4 5 0 0 0 3 0 0 0 1
2019 PA MALE A 05 0 6 0 0 0 0 0 0 0 0
2019 PA MALE B 01 10 5 4 0 0 0 0 0 0 0
2019 PA MALE B 02 16 9 0 0 1 10 0 0 0 0
2019 PA MALE B 03 12 12 7 0 0 0 0 6 0 8
2019 PA MALE B 04 1 16 0 0 0 0 0 1 0 0
2019 PA MALE B 05 3 8 0 0 0 3 0 0 0 0
2019 PA MALE W 01 0 7 0 0 0 0 0 0 0 0
2019 PA MALE W 02 5 12 0 0 0 0 1 0 0 0
2019 PA MALE W 03 0 5 0 0 0 0 0 0 0 0
2019 PA MALE W 04 0 44 0 10 0 0 0 0 0 0
2019 PA MALE W 05 9 23 9 0 0 0 0 0 0 0
;run;
proc tabulate data=death_summary;
class year sex race_eth age;
var num_deaths pop;
table sex race_eth='Race/Ethnicity' age,
year=''*(num_deaths='Count'*sum=''*f=8.
num_deaths='%'*colpctsum=''*f=8.1
pop='Population'*sum=''*f=8.)
/ misstext='0' nocellmerge;
run;
There are many more levels of states/age categories/race_eth, but this sample hopefully gives a reasonable impression. The proc tabulate is exactly what I'm going for, but I'd love there to be a column [num_deaths/pop*100000]. Thanks so much.
You could calculate the rate in a DATA step or SQL step, then in tabulate use something like:
rate='rate'*mean=''*f=8.1
So if rate has the same value for every record within a year, then you can report the value with mean or min or max.
That said, if you can provide small example dataset (as a CARDS data step) and the desired output, it will be easier for people to help.
You could try a compute block in Proc Report.
If you want actual code then please provide SAS code that creates sample data and show us the desired output using this sample data.
Below what Copilot suggested when asked to convert your Proc Tabulate to Proc Report (doesn't mean that's correct code but it's a start)
proc report data=death_summary nowd;
column sex race_eth age year num_deaths pct_deaths pop rate;
define sex / group;
define race_eth / group 'Race/Ethnicity';
define age / group;
define year / across '';
define num_deaths / sum 'Count' format=8.;
define pct_deaths / computed '%' format=8.1;
define pop / sum 'Population' format=8.;
define rate / computed format=8.1;
compute pct_deaths;
pct_deaths = num_deaths.sum / pop.sum * 100;
endcomp;
compute rate;
rate = num_deaths.sum / pop.sum * 100000;
endcomp;
rbreak after / summarize;
run;
Thanks - I did some version of this, but it's counting rows and the number of rows is irrelevant to my dataset. I posted a sample dataset that hopefully gives a better idea. I'll keep tweaking proc report, thanks again.
I'm considering using proc sql, but maybe just because I'm more accustomed with sql syntax. I did post some sample data in another reply, thanks for your note.
data death_summary;
input year:$4. state_name:$21. SEX:$6. race_eth:$50. age:$2. num_deaths:32. pop:32.
cause1:32. cause2:32. cause3:32. cause4:32. cause5:32. cause6:32. cause7:32. cause8:32.;
datalines;
2019 PA FEMALE A 01 2 10 2 0 0 0 0 0 0 0
2019 PA FEMALE A 02 4 9 4 0 0 0 0 0 0 0
2019 PA FEMALE A 03 0 11 0 0 0 0 0 0 0 0
2019 PA FEMALE A 04 2 6 2 0 0 0 0 0 0 0
2019 PA FEMALE A 05 0 7 0 0 0 0 0 0 0 0
2019 PA FEMALE B 01 5 5 4 0 0 0 0 0 0 0
2019 PA FEMALE B 02 11 8 0 0 11 0 0 0 0 0
2019 PA FEMALE B 03 6 12 10 0 0 0 0 6 0 0
2019 PA FEMALE B 04 2 33 1 0 0 0 0 1 0 0
2019 PA FEMALE B 05 0 5 0 0 0 0 0 0 0 0
2019 PA FEMALE W 01 2 45 2 0 0 0 0 0 0 0
2019 PA FEMALE W 02 4 6 0 0 0 0 3 0 0 0
2019 PA FEMALE W 03 6 9 0 0 0 0 6 0 0 0
2019 PA FEMALE W 04 7 12 0 0 0 0 0 7 0 0
2019 PA FEMALE W 05 1 22 1 0 0 0 0 0 0 0
2019 PA MALE A 01 6 4 0 0 0 0 6 0 0 0
2019 PA MALE A 02 10 14 6 0 0 5 0 0 0 0
2019 PA MALE A 03 7 11 0 0 0 0 7 0 0 0
2019 PA MALE A 04 4 5 0 0 0 3 0 0 0 1
2019 PA MALE A 05 0 6 0 0 0 0 0 0 0 0
2019 PA MALE B 01 10 5 4 0 0 0 0 0 0 0
2019 PA MALE B 02 16 9 0 0 1 10 0 0 0 0
2019 PA MALE B 03 12 12 7 0 0 0 0 6 0 8
2019 PA MALE B 04 1 16 0 0 0 0 0 1 0 0
2019 PA MALE B 05 3 8 0 0 0 3 0 0 0 0
2019 PA MALE W 01 0 7 0 0 0 0 0 0 0 0
2019 PA MALE W 02 5 12 0 0 0 0 1 0 0 0
2019 PA MALE W 03 0 5 0 0 0 0 0 0 0 0
2019 PA MALE W 04 0 44 0 10 0 0 0 0 0 0
2019 PA MALE W 05 9 23 9 0 0 0 0 0 0 0
;run;
proc sql;
create table report as
select 1 as id1,'SEX' as row1 length=80,1 as id2,sex as row2 length=80,year,
sum(num_deaths) as Count,
100*calculated Count/(select sum(num_deaths) from death_summary where sex is not missing and year=a.year) as per label='%' format=8.1,
sum(pop) as Population,
100*calculated Count/calculated Population as rate label='rate(%)' format=8.1
from death_summary as a
where sex is not missing and year is not missing
group by sex,year
union all
select 2,'Race/Ethnicity',2,race_eth,year,
sum(num_deaths) as Count,
100*calculated Count/(select sum(num_deaths) from death_summary where race_eth is not missing and year=a.year) as per label='%' format=8.1,
sum(pop) as Population,
100*calculated Count/calculated Population as rate label='rate(%)' format=8.1
from death_summary as a
where race_eth is not missing and year is not missing
group by race_eth,year
union all
select 3,'Age',3,age,year,
sum(num_deaths) as Count,
100*calculated Count/(select sum(num_deaths) from death_summary where age is not missing and year=a.year) as per label='%' format=8.1,
sum(pop) as Population,
100*calculated Count/calculated Population as rate label='rate(%)' format=8.1
from death_summary as a
where age is not missing and year is not missing
group by age,year
;
quit;
proc report data=report nowd;
columns id1 row1 id2 row2 year,(count per population rate);
define id1/group noprint;
define row1/group noprint;
define id2/group noprint;
define row2/group '';
define year/across '';
define count/analysis sum;
define per/analysis sum;
define population/analysis sum;
define rate/analysis sum;
compute before row1/style={just=l};
line row1 $80.;
endcomp;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.