I'm trying to practice with Proc Report, using the SAShelp class data here.
I would like to get counts and proportions (%) of the 'short' people, for subgroups by sex and age. This is the desired output:
Sex | Age | subgroup_total | count_short | prop_short |
F | 11 | 1 | 1 | 100.0 |
12 | 2 | 2 | 100.0 | |
13 | 2 | 1 | 50.0 | |
14 | 2 | 0 | 0.0 | |
15 | 2 | 0 | 0.0 | |
M | 11 | 1 | 1 | 100.0 |
12 | 3 | 2 | 66.7 | |
13 | 1 | 0 | 0.0 | |
14 | 2 | 0 | 0.0 | |
15 | 2 | 0 | 0.0 | |
16 | 1 | 0 | 0.0 |
Here's what I tried, I thought something like this might work for the count, but it doesn't seem to:
DATA Class_height;
SET sashelp.class;
IF Height < 60 THEN Ht_category = 'short';
ELSE IF Height => 60 AND Height < 65 THEN Ht_category = 'med';
ELSE Ht_category = 'tall';
RUN;
PROC REPORT DATA = Class_height;
COLUMN Sex Age Weight=n Ht_category=n1 Prop_short;
DEFINE Sex / group;
DEFINE Age / group;
DEFINE n / n 'subgroup_total';
DEFINE n1 / computed 'count_short';
COMPUTE n1;
IF Ht_category = 'short' THEN count+1;
n1 = count;
ENDCOMP;
DEFINE Prop_short / computed 'prop_short' format=4.1;
COMPUTE Prop_short;
Prop_short = n1/n*100;
ENDCOMP;
RUN;
Thanks
When I use Proc Report and want interesting values, especially percents, I usually do some calculation prior to the report so that I know what denominators are used. Getting column summaries to use in other summaries generally takes me too much time.
An example that creates a 1/0 coded value for characteristic of interest and then uses SUM to get the number of 1's and MEAN as percent of 1's, using a percent format to display
DATA Class_height; SET sashelp.class; short = (height < 60); RUN; PROC REPORT DATA = Class_height; COLUMN Sex Age Weight=n short short=shortpct; DEFINE Sex / group; DEFINE Age / group; DEFINE n / n 'subgroup_total'; define short / sum 'Short Count'; define shortpct /format=percent8.1 mean 'Percent short'; RUN;
When I use Proc Report and want interesting values, especially percents, I usually do some calculation prior to the report so that I know what denominators are used. Getting column summaries to use in other summaries generally takes me too much time.
An example that creates a 1/0 coded value for characteristic of interest and then uses SUM to get the number of 1's and MEAN as percent of 1's, using a percent format to display
DATA Class_height; SET sashelp.class; short = (height < 60); RUN; PROC REPORT DATA = Class_height; COLUMN Sex Age Weight=n short short=shortpct; DEFINE Sex / group; DEFINE Age / group; DEFINE n / n 'subgroup_total'; define short / sum 'Short Count'; define shortpct /format=percent8.1 mean 'Percent short'; RUN;
@ballardw wrote:
1/0 coded value for characteristic of interest and then uses SUM to get the number of 1's and MEAN as percent of 1's, using a percent format to display
Thank you, this is excellent. Having a column for each category as a binary outcome (0/1) is a much better approach. I don't why in my head I was overcomplicating XD. I updated my code accordingly so it now also shows the 3 categories (short, medium, tall). Hopefully it will help others who come across this in the future.
DATA Class_height; SET sashelp.class; short = (height < 60);
medium = (60 <= height < 65); tall = (height => 65); RUN; PROC REPORT DATA = Class_height; COLUMN Sex Age Weight=n short short=shortpct medium medium=mediumpct tall tall=tallpct ; DEFINE Sex / group; DEFINE Age / group; DEFINE n / n 'Subgroup Total'; DEFINE short / sum 'Short Count'; DEFINE shortpct / format=percent8.1 mean 'Short %'; DEFINE medium / sum 'Medium Count'; DEFINE mediumpct / format=percent8.1 mean 'Medium %'; DEFINE tall / sum 'Tall Count'; DEFINE tallpct / format=percent8.1 mean 'Tall %'; RUN;
Sex | Age | Subgroup Total | Short Count | Short % | Medium Count | Medium % | Tall Count | Tall % |
---|---|---|---|---|---|---|---|---|
F | 11 | 1 | 1 | 100.0% | 0 | 0.0% | 0 | 0.0% |
12 | 2 | 2 | 100.0% | 0 | 0.0% | 0 | 0.0% | |
13 | 2 | 1 | 50.0% | 0 | 0.0% | 1 | 50.0% | |
14 | 2 | 0 | 0.0% | 2 | 100.0% | 0 | 0.0% | |
15 | 2 | 0 | 0.0% | 1 | 50.0% | 1 | 50.0% | |
M | 11 | 1 | 1 | 100.0% | 0 | 0.0% | 0 | 0.0% |
12 | 3 | 2 | 66.7% | 1 | 33.3% | 0 | 0.0% | |
13 | 1 | 0 | 0.0% | 1 | 100.0% | 0 | 0.0% | |
14 | 2 | 0 | 0.0% | 1 | 50.0% | 1 | 50.0% | |
15 | 2 | 0 | 0.0% | 0 | 0.0% | 2 | 100.0% | |
16 | 1 | 0 | 0.0% | 0 | 0.0% | 1 | 100.0% |
And another way to get a similar report with a different procedure:
proc format; value ht low - < 60='Short' 60 - < 65='Medium' 65 - high ='Tall'; run; proc tabulate data=sashelp.class; class sex age height; format height ht.; tables sex*age, n='Group count' height*(n='count' rowpctn='percent') /misstext=' ' ; run;
Tabulate may have a few more options for denominators with row, col, page and report. and use of n (categorical or class variables) or Sum (of Var variables not used). Note that the use a format makes the code pretty flexible, change the format definition used and change the report categories. What Tabulate will not due generally is allow use of results of other statistics to be used in calculation (no COMPUTE BLOCK), character variables may only be class and on the row or column heading and different places for things like labels and style overrides.
Tabulate is, at least in my opinion, easier to deal with nested column headings. It also will create multiple tables with one pass of the procedure:
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.