Hello,
I am unable to create subtotal for each group. I am getting this error.
ERROR: You can only BREAK on GROUPing and ORDERing variables.
When I adjusted by code and include group or order in the define, i still have the error.
This is my output. I want subtotal under each group (Sex, Age category, Race/Ethnicity)
this is my code
proc report data=cases8
style(report)=[background=white]
style(header)=[background=#919CB2 foreground=black borderstyle=hidden];
col Cat2 Y2020 Y2021 Y2022 Y2023 Y2024 Y3000;
define cat2/order display 'Variable';
define Y2020/analysis sum 'Year 2020';
define Y2021/analysis sum 'Year 2021';
define Y2022/analysis sum 'Year 2022';
define Y2023/analysis sum 'Year 2023';
define Y2024/analysis sum 'Year 2024';
define Y3000/analysis sum 'Total';
/*Make row headings bold and more readable*/
break after cat2/summarize;
rbreak after /summarize;
compute after;
cat2='SubTotal';
endcomp;
compute cat2;
if strip(cat2) in('Sex' 'Age category' 'Race/Ethnicity') then do;
call define(_row_, "style", "style=[backgroundcolor=#DFE5E5 fontweight=bold]");
end;
endcomp;
title "Total number of encounters";
run;
Please share some sample data.
Aren't you missing a variable? Should there be two grouping variables. One with values like SEX and AGE and second with values like MALE and <20.
@Tom I am not missing any variable. I have grouped all the variable into one called cat. Here is my sample dataset
cat2 | Y2020 | Y2021 | Y2022 | Y2023 | Y2024 | Y3000 |
Sex | ||||||
Male | 320905 | 259692 | 177215 | 63338 | 104565 | 925715 |
Female | 158352 | 138476 | 96722 | 34446 | 35600 | 463596 |
Unknown | 1 | 4 | 5 | |||
Age category | ||||||
<20 | 18708 | 11688 | 6201 | 2227 | 4973 | 43797 |
20-24 | 135068 | 113050 | 71511 | 25253 | 34164 | 379046 |
25-29 | 111183 | 91723 | 62431 | 22946 | 31903 | 320186 |
30-34 | 78006 | 64702 | 46754 | 17256 | 24983 | 231701 |
35-39 | 70952 | 59787 | 43850 | 15343 | 22775 | 212707 |
>=40 | 65339 | 57210 | 43191 | 14759 | 21371 | 201870 |
Missing | 1 | 8 | 9 | |||
Race/Ethnicity | ||||||
White, non-Hispanic | 252063 | 204076 | 137564 | 45679 | 70975 | 710357 |
Black, non-Hispanic | 95670 | 84966 | 56493 | 19983 | 23201 | 280313 |
Hispanic | 79838 | 67523 | 48285 | 18923 | 27194 | 241763 |
Other | 51686 | 41603 | 31596 | 13199 | 18799 | 156883 |
You can easily fix that issue by making a new dataset from your existing one.
First let's convert your listing back into an actual dataset so we have something to program with.
data have;
infile cards dsd dlm='|' truncover;
input cat2 :$20. Y2020-Y2024 Y3000 ;
cards;
Sex| | | | | |
Male|320905|259692|177215|63338|104565|925715
Female|158352|138476|96722|34446|35600|463596
Unknown| |1| |4|5
Age category| | | | |
<20|18708|11688|6201|2227|4973|43797
20-24|135068|113050|71511|25253|34164|379046
25-29|111183|91723|62431|22946|31903|320186
30-34|78006|64702|46754|17256|24983|231701
35-39|70952|59787|43850|15343|22775|212707
>=40|65339|57210|43191|14759|21371|201870
Missing|1|8| | | |9
Race/Ethnicity| | | | |
White, non-Hispanic|252063|204076|137564|45679|70975|710357
Black, non-Hispanic|95670|84966|56493|19983|23201|280313
Hispanic|79838|67523|48285|18923|27194|241763
Other|51686|41603|31596|13199|18799|156883
;
Now just read in the data and create the missing variable(s).
data want;
length group 8 cat $20;
retain group cat;
set have;
if n(of y:)=0 then do;
group+1;
cat=cat2;
delete;
end;
run;
Now it is simple to add total rows. I will leave the style setting to you.
title "Total number of encounters";
proc report data=want
style(report)=[background=white]
style(header)=[background=#919CB2 foreground=black borderstyle=hidden]
;
col group cat Cat2 Y2020 Y2021 Y2022 Y2023 Y2024 Y3000;
define group / group noprint;
define cat / group noprint;
define cat2/group 'Variable';
define Y2020/analysis sum 'Year 2020';
define Y2021/analysis sum 'Year 2021';
define Y2022/analysis sum 'Year 2022';
define Y2023/analysis sum 'Year 2023';
define Y2024/analysis sum 'Year 2024';
define Y3000/analysis sum 'Total';
compute before cat ;
line @1 cat $20.;
endcomp;
break after cat/summarize;
rbreak after /summarize;
compute after cat ;
cat2='SubTotal';
endcomp;
compute after ;
cat2='Total';
endcomp;
run;
title;
Just so it fits on the screen I ran it with WHERE statement to limit it to groups 1 and 3.
You probably don't want the TOTAL row for this data, but just in case you need to make a different report where the groups can be combined I included it.
If you want to preserve the original order it might help to add another variable when converting the data back into a useful form. Let's call it ITEM.
data want;
length group 8 cat $20 item 8;
retain group cat;
set have;
if n(of y:)=0 then do;
group+1;
cat=cat2;
item=0;
delete;
end;
item+1;
run;
And include ITEM in PROC REPORT's COLUMN and DEFINE statements.
column group cat item Cat2 Y2020 Y2021 Y2022 Y2023 Y2024 Y3000;
define group--item / group noprint;
Hi @Tom , I think you are so brilliant at find out the major issue in answering this question, which is to add a column to be used as the group variable in proc report step, and you solved the issue very effectively and promptly, I cannot find another way better than use the n(of y:)=0; statement, so I write my version based on your ideas and this code.
However, what I would like to tell you is that, this major issue(create a group variable to be used in the proc report step and meanwhile consider the order to be displayed in the final report) perhaps is not THAT substantial and need THAT much thinking and THAT many steps/syntax to solve it (because I see in your code a lot of statements were used to deal with this issue and resulted redundant coding). Super good logic/math thinking and super good at/or remember a lot of syntax somehow becomes disadvantage here I guess😀. And what you wrote in this part (a big portion of this part) maybe not necessary (see screen capture):
As for the order of the group variable in the proc report step, the simplest way to solve it is use the order=; option. Here use order=data; makes the data to be displayed in the original order. While writing the proc report; step, I referred to SAS base 9.4 guide and see the order=; options in the 1st (or 2nd example) and I used that option (when I write proc report I always need to refer to the examples in the guide, among SAS base procs, proc report and proc tabulate probably are the steps I am least familiar with and practiced least somehow😀).
I suspect that your Error message is caused by using both ORDER and DISPLAY for the CAT2 variable. Many procedures when seeing two, or sometimes more, settings for the same single expected value will use the LAST one encountered in the code. I am guessing that you have summarized the data when creating what appear to be year total variables and hence the DISPLAY of the CAT2 looks correct and the ORDER is ignored.
With what you show for the result, since no example data has been provided and have to make many guesses about what the actual raw data may have looked like, I would probable go with a Proc Tabulate approach.
Instead of the step(s) I expect you used to create the CAT2 variable, which I am assuming you had to make to force anything resembling a stacking of different variables into single column for the rows Tabulate would allow use of the raw data.
Plus if the data originally only had one observation per person with year as a value then you don't have to do what ever summary step you included to force the multiple column variables (note: an ACROSS variable on a YEAR variable might be more desirable in Proc Report anyway).
Here is an example of a raw data set with random creations of one observation per person represented and a Proc Tabulate approach. Note, if your actually have MISSING values for a variable you need to add the option MISSING to the Class statement for any grouping variable or the observation is excluded from the report. The order of variables that would appear as the row label would also tend to make the MISSING appear first. I generally create or modify variables so the order of appearance is under my control and replace Missing values with a code value and have a format to assign meaningful text.
I have provided one format as an example of how to include desired text. The PRELOADFMT option may not be needed but sometimes helps in the control of order of values in the resulting table.
data example; do year=2020 to 2024; do i= 1 to rand('integer',10000); /* make random number of records per year*/ /* rand table creates numeric values according to a table of probabilites to randomly assign a sex, age and race for each observation */ sex= rand('table',.6, .399, .001); age= rand('table',.08, .21, .21, .17, .1695, .16, 0.0005); Race= rand('table',.6, .2, .15, .5); output; end; end; run; proc format; value sex 1='Male' 2='Female' 3='Unknown' ; run; proc tabulate data=example; class year; class sex /preloadfmt; class age race; format sex sex. ; table sex all='Sex total' age all='Age Total' race all='Race total' , (year=' ' all='All years')*n='' /printmiss misstext=' '; ; run;
The PRINTMISS with PRELOADFMT means that the row labels will appear even if there are no values for any of the columns, which might happen with your 'Unknown' sex.
The use of the YEAR variable to create the columns allows the ALL table option to create the total columns. Which means when the next Year of data is added the code for the table doesn't need to change (i.e. inserting a Year2025 then Year2026, Year2027 etc).
Unless your data is read odd I would expect the total row for each of the Sex, Age and Race variables to be the same for a given year. If not then you really need to provide an example data set of that in the form of a data step code. I believe this have been mentioned in more than one of your threads.
Try to remove DISPLAY option:
define cat2/order display 'Variable'; ----> define cat2/order 'Variable'; or define cat2/group 'Variable';
Hi @CathyVI , it's looks to me that you do not have raw data, and the table (which you call it your sample dataset) you work on is in fact a report/result table in excel that has been imported to SAS as a SAS dataset, what you want to do is add the subtotal and total. My first suggestion is, calculate the subtotal and total in excel (which is much easier). If one does need to work on this in SAS, here is what I tried (with a mock dataset), I know the display/layout does not match your request, but I think my code can make the steps simpler and easier, especially if you want only the code for calculate the subtotal and total. BTW, my code is base on @Tom 's code and ideas, I just did minor changes, and also simplified the code a bit according to @Ksharp 's suggestion.
data mockdata1;
input cat $ y1 y2 y3 y4 y5 yt;
datalines;
sex . . . . . .
m 1 2 3 4 5 15
f 1 2 3 4 5 15
unknown . . 1 . 4 5
age . . . . . .
age1 1 2 3 4 5 15
age2 1 2 3 4 5 15
age3 1 2 3 4 5 15
age4 1 2 3 4 5 15
age5 1 2 3 4 5 15
age6 1 2 3 4 5 15
missing 1 8 . . . 9
race . . . . . .
race1 1 2 3 4 5 15
race2 1 2 3 4 5 15
race3 1 2 3 4 5 15
other 1 2 3 4 5 15
;
run;
proc print data=mockdata1;run;
data mockdata2;
retain cat1;
set mockdata1;
if n(of y:)=0 then cat1=cat;
run;
proc print data=mockdata2;run;
proc report data=mockdata2;
column cat1 cat y1 y2 y3 y4 y5 yt;
define cat1/group order noprint;
define cat/'Variable';
define y1--yt/analysis sum;
rbreak after/summarize
style=[font_weight=bold];
break after cat1/summarize
style=[font_style=italic];
run;
Sorry, I did several minor changes in the code while posting and editing my last thread and made a mistake in the "order=data" option (I somehow mistakenly deleted the "=data" part and the order became alphabetic, not the original order in the dataset). So the final-final version of the code is as follows (the output and results are the same with my last thread though).
data mockdata1;
input cat $ y1 y2 y3 y4 y5 yt;
datalines;
sex . . . . . .
m 1 2 3 4 5 15
f 1 2 3 4 5 15
unknown . . 1 . 4 5
age . . . . . .
age1 1 2 3 4 5 15
age2 1 2 3 4 5 15
age3 1 2 3 4 5 15
age4 1 2 3 4 5 15
age5 1 2 3 4 5 15
age6 1 2 3 4 5 15
missing 1 8 . . . 9
race . . . . . .
race1 1 2 3 4 5 15
race2 1 2 3 4 5 15
race3 1 2 3 4 5 15
other 1 2 3 4 5 15
;
run;
proc print data=mockdata1;run;
data mockdata2;
retain cat1;
set mockdata1;
if n(of y:)=0 then cat1=cat;
run;
proc print data=mockdata2;run;
proc report data=mockdata2;
column cat1 cat y1 y2 y3 y4 y5 yt;
define cat1/group order=data noprint;
define cat/'Variable';
define y1--yt/analysis sum;
rbreak after/summarize
style=[font_weight=bold];
break after cat1/summarize
style=[font_style=italic];
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.