Help using Base SAS procedures

Proc report role of summarize after break variable

Accepted Solution Solved
Reply
Contributor gsk
Contributor
Posts: 25
Accepted Solution

Proc report role of summarize after break variable

When I put break or rbreak after + some variable, sometimes proc report prints out sums, sometimes means or neither. If you find the attached for the output of the code below, the first highlighted value 25.2 seems to be the mean, but not other values. What are these numbers that proc report is showing as a summary? 

 

SAS documentation just says that the "summarize" option shows a summary line...... What kind of summary line is that showing though? There can be many, including means, medians, and quartiles....... 

 

 

proc report nowd data=cars;

where upcase(country) in ('GERMANY','JAPAN','USA') and
upcase(type) in ('SUV','SEDAN','HATCHBACK');

columns type country citympg hwympg ('Std Dev' citympg=citistd hwympg=hwystd);

define country / group 'Country of Origin';
define type / group 'Type of Car';
define citympg / analysis mean 'City MPG' format=5.1;
define hwympg / analysis mean 'Highway MPG' format=5.1;
define citistd / analysis std 'City MPG Std' format=5.1;
define hwystd / analysis std 'Highway MPG std' format=5.1;

break after type / summarize suppress DOL ul style=[BACKGROUNDCOLOR=ltgray];
/* break after cylinders/ summarize suppress ol ul; */
rbreak after / summarize ol ul;
run;

 

proc report.JPG


Accepted Solutions
Solution
‎04-09-2018 03:34 PM
Super User
Posts: 13,508

Re: Proc report role of summarize after break variable


@gsk wrote:

Thank you for the reply!


But for Hatchback and Highway MPG for instance, 32.5 is not the average of 31.7, 32.2, and 36. Same as SUV and City MPG, SUV and Highway MPG, etc.

 

Also, what does rbreak indicate? It doesn't have averages of all cars for each City MPG either. 


You need to look at how many models fall into each category. Suppose the country with the 32.2 average has 6 models but the others only one each the overall mean is different as the divisor is different.

data example;
   simplemean= mean(31.7,32.2,36); /* and almost certainly incorrect unless each country has exactly one model*/
   meanwithcounts = mean (31.7, 32.2,32.2,32.2,32.2,32.2,32.2, 36);
run;

View solution in original post


All Replies
SAS Super FREQ
Posts: 9,365

Re: Proc report role of summarize after break variable

[ Edited ]

Hi:
The statistic that is produced on a BREAK or RBREAK summary line depends on the statistic that you list in your DEFINE statement for the variable.

For example, in your code, you have asked for MEAN for 2 of your variables and STD for the other 2 numeric variables. If you want SUM instead of MEAN or STD, you have to change the statistic listed. Without the name of a statistic, a numeric variable defaults to the SUM statistic. As soon as you use a statistic, then you get THAT statistic at the break.

Cynthia (see the example below)

 

proc report data=sashelp.cars
  style(summary) = Header;
  where make in ('Honda' 'Toyota' 'BMW' 'Cadillac');
  column type make wheelbase msrp mpg_highway mpg_city enginesize
         invoice horsepower weight length;
  define type / group;
  define make / group;
  define wheelbase / n 'N WheelBase';
  define msrp / mean 'Mean MSRP';
  define mpg_highway / sum 'SUM MPG HIGHWAY';
  define mpg_city / std 'STD MPG City';
  define enginesize / css 'CSS EngineSize';
  define invoice / median 'Median Invoice';
  define horsepower / stderr 'StdErr HorsePower'; 
  define weight / min 'Min Weight';
  define length / max 'Max Length';
  break after type / summarize;
  rbreak after / summarize;
run;
Contributor gsk
Contributor
Posts: 25

Re: Proc report role of summarize after break variable

Posted in reply to Cynthia_sas

Thank you for the reply!


But for Hatchback and Highway MPG for instance, 32.5 is not the average of 31.7, 32.2, and 36. Same as SUV and City MPG, SUV and Highway MPG, etc.

 

Also, what does rbreak indicate? It doesn't have averages of all cars for each City MPG either. 

SAS Super FREQ
Posts: 9,365

Re: Proc report role of summarize after break variable

[ Edited ]

Hi:

  I did not use your data, but for SASHELP.CARS, when I double check my numbers from PROC REPORT against the same statistics with PROC MEANS, I get the same numbers, so for my data (and all the other test cases I use) PROC MEANS and PROC REPORT give me the same values. Do remember that PROC REPORT is showing the formatted numbers, which is why some rounding occurs:

same_numbers_report_mean.png

 

 

  Try the revised code below and you should get the same results in PROC MEANS for all the BREAK lines.

 

  An RBREAK statement is the summary at the bottom (or top) of the report. It is the summary overall the entire report, if you are getting the SUM statistic, then it is what you might call a Grand Total. Only the statistic you get on the RBREAK line (in my case, at the bottom of the report) is the statistic requested for the variable.

 

  Since you did not provide any data, it is impossible to verify that PROC REPORT is generating incorrect numbers. However, remember that if you have grouped the items by your Country variables, so that each report row represents the AVERAGE for a country, that PROC REPORT is not averaging the averages -- it is taking the grand mean -- the sum of ALL the values divided by the TOTAL count of non-missing observations -- this will usually be different than the average of the averages, which is why you have to use a PROCEDURE like PROC MEANS to compare against. The best thing for you to do is either run a verification like mine, with PROC MEANS or open a track with Tech Support and send them ALL your data and ALL your code and see if they can explain why or how PROC REPORT is not generating the results you expect.

 

Cynthia

 

** Revised code that proves PROC REPORT is generating the same statistics as PROC MEANS;

proc report data=sashelp.cars
  style(summary) = Header;
  where make in ('Honda' 'Toyota' 'BMW' 'Cadillac');
  column type make wheelbase msrp mpg_highway mpg_city enginesize
         invoice horsepower weight length;
  define type / group;
  define make / group;
  define wheelbase / n 'N WheelBase';
  define msrp / mean 'Mean MSRP';
  define mpg_highway / sum 'SUM MPG HIGHWAY';
  define mpg_city / std 'STD MPG City';
  define enginesize / css 'CSS EngineSize';
  define invoice / median 'Median Invoice';
  define horsepower / stderr 'StdErr HorsePower'; 
  define weight / min 'Min Weight';
  define length / max 'Max Length';
  break after type / summarize;
  rbreak after / summarize;
run;

** get same numbers with PROC MEANS as PROC REPORT;
proc means data=sashelp.cars n;
   where make in ('Honda' 'Toyota' 'BMW' 'Cadillac');
   class type ;
  var wheelbase ; 
run;

proc means data=sashelp.cars mean;
   where make in ('Honda' 'Toyota' 'BMW' 'Cadillac');
   class type ;
  var msrp ; 
run;

proc means data=sashelp.cars sum;
   where make in ('Honda' 'Toyota' 'BMW' 'Cadillac');
   class type ;
  var mpg_highway ; 
run;

proc means data=sashelp.cars std;
   where make in ('Honda' 'Toyota' 'BMW' 'Cadillac');
   class type ;
  var mpg_city ; 
run;

proc means data=sashelp.cars css;
   where make in ('Honda' 'Toyota' 'BMW' 'Cadillac');
   class type ;
  var enginesize ; 
run;

proc means data=sashelp.cars median;
   where make in ('Honda' 'Toyota' 'BMW' 'Cadillac');
   class type ;
  var invoice ; 
run;

proc means data=sashelp.cars stderr;
   where make in ('Honda' 'Toyota' 'BMW' 'Cadillac');
   class type ;
  var horsepower ; 
run;

proc means data=sashelp.cars min;
   where make in ('Honda' 'Toyota' 'BMW' 'Cadillac');
   class type ;
  var weight ; 
run;

proc means data=sashelp.cars max;
   where make in ('Honda' 'Toyota' 'BMW' 'Cadillac');
   class type ;
  var length ; 
run;

 

 

Solution
‎04-09-2018 03:34 PM
Super User
Posts: 13,508

Re: Proc report role of summarize after break variable


@gsk wrote:

Thank you for the reply!


But for Hatchback and Highway MPG for instance, 32.5 is not the average of 31.7, 32.2, and 36. Same as SUV and City MPG, SUV and Highway MPG, etc.

 

Also, what does rbreak indicate? It doesn't have averages of all cars for each City MPG either. 


You need to look at how many models fall into each category. Suppose the country with the 32.2 average has 6 models but the others only one each the overall mean is different as the divisor is different.

data example;
   simplemean= mean(31.7,32.2,36); /* and almost certainly incorrect unless each country has exactly one model*/
   meanwithcounts = mean (31.7, 32.2,32.2,32.2,32.2,32.2,32.2, 36);
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 178 views
  • 3 likes
  • 3 in conversation