BookmarkSubscribeRSS Feed
jteres
Obsidian | Level 7

I was able to create multi-level summaries in PROC REPORT using the following code.

 

With the UNDERLINE in the BREAK AFTER, is there a way to have it extend only for some columns, rather than across the entire report?

 

For example, could it cut across from Origin through Cumpct, but then not interrupt Gtotpct -- Gcumpct?

 

Also, I figured this out by adapting the code in this excellent paper: http://www2.sas.com/proceedings/sugi27/p120-27.pdf

 

but it's still not clear to me why I need to reset the cumulative totals when the _BREAK_ value changes even though I am using COMPUTE BEFORE.

 

 Any tips are appreciated!

 

 

proc report data = sashelp.cars nowd out = car_report split = '\' ;

     column    origin         /* crosstab var 1 */

               make           /* crosstab var 2 */

               Horsepower     /* REPORT needs a numeric variable to use N-- this was sort of arbitrary */

                              /* it's essentially count within group */

               pct            /* percent within group */

               cumfreq        /* cumulative count within group */

               cumpct         /* cumulative percent within group */

               gtotpct        /* global percent */

               gcumfreq       /* global cumulative frequency */

               gcumpct ;      /* global cumulative percent */

 

     ** define all the vars ;

     ** thanks, PROC REPORT ;

     ** ORIGIN and MAKE are the variables we're grouping by ;

     define    origin         / group ;

     define    make           / group ;

     ** Horsepower is really just used for the N statistic. to do that, it must be declared an ANALYSIS column ;

     define    Horsepower     / analysis     format = comma12. n 'Subgroup\Frequency' ;

     ** the rest of the columns are computed ;

 

     ** to be calculated within the groups ;

     ** this is the percent column ;

     define    pct            / computed     format = percent9.2 'Subgroup\Percent' ;

     ** this is the cumulative frequency column ;

     define    cumfreq        / computed     format = comma12.0  'Subgroup\Cumulative\Frequency' width = 10 ;

     ** this is the cumulative percent column ;

     define    cumpct         / computed     format = percent9.2 'Subgroup\Cumulative\Percent' width = 10 ;

 

     ** these are for global/grand totals ;

     ** this column is the percent across all observations/combinations ;

     define    gtotpct        / computed     format = percent9.2 'Percent' ;

     ** this column is the cumulative frequency across all combinations ;

     define    gcumfreq       / computed     format = comma12.   'Cumulative\Frequency' width = 10 ;

     ** this column is the cumulative percent across all combinations ;

     define    gcumpct        / computed     format = percent9.2 'Cumulative\Percent' width = 10 ;

 

     ** PROC REPORT requires that the columns be listed in order on the COLUMN statement ;

     ** it makes sense to keep them in order in the DEFINE block ;

     ** for these COMPUTE blocks, I'm ordering them in more of a logical order for processing ;

 

     ** before anything happens, initialize gtotal as the sum of Horsepower.n and gcumtot to 0 ;

     compute before ;

          gtotal + Horsepower.n ;

          gcumtot = 0 ;

     endcomp ;

 

     ** before each new origin value, re-initialize total and cumtot ;

     compute before origin ;

          total + Horsepower.n ;

          cumtot = 0 ;

     endcomp ;

 

     ** pct is the count (Horsepower.n) over the total, very straight forward ;

     compute pct ;

          pct = Horsepower.n / total ;

     endcomp ;

 

     ** the cumulative frequency is the running total ;

     ** reset it at the breaks ;    

     compute cumfreq ;

          cumtot + Horsepower.n  ;

          cumfreq = cumtot ;

 

          ** the relationship between these resets and the COMPUTE BEFORE isn't totally clear to me ;

          ** but it doesn't work otherwise ::JJT::08-24-18:: ;

          if _break_ = "Origin" THEN cumfreq = . ;

          if _break_ = "Origin" THEN total = 0 ;

    endcomp ;

 

     ** compute the percent of all combination (uses gtotal) ;

     compute gtotpct ;

          gtotpct = Horsepower.n / gtotal ;

     endcomp ;

 

     ** create running cumulative percent ;

     compute cumpct ;

          cumpct = cumfreq / total ;

     endcomp ;

 

     ** create global cumulative frequency ;

     ** it's similar to the "local" counts but... ;

     ** in order to get this to work, i added the IF condition ;

     ** it makes more sense if you look at the output report data, but some totals are computed BEFORE ;

     ** and obviously I didn't want that to happen because it was inflating my counts ;

     compute gcumfreq ;

          if not(missing(make)) then gcumtot + Horsepower.n ;

          gcumfreq = gcumtot ;

 

          ** yeah, again, this is a little bit of magic to me. ;

          ** i might post this on SAS Communities and see if Cynthia Zender will explain it to me ;

          IF _break_ = "_RBREAK_" THEN gcumfreq = . ;

          if _break_ = "_RBREAK_" THEN gtotal = 0 ;

     endcomp ;

 

     ** again, this is a relatively straightforward calculation ;

     compute gcumpct ;

          gcumpct = gcumfreq / gtotal ;

     endcomp ;

 

     ** this is required ;

     ** you have to choose SOMETHING for proc report to do here ;

     ** you can opt to have it SUPPRESS any subgroup summaries ;

     ** i kept the underline (UL) because it only shows up in the listing output and makes it a little more readable ;

     ** one drawback is the goes across the entire report field, which can probably be fixed but that's icing ;

     break after Origin / ul ;

run ;

 

 

 

5 REPLIES 5
ChrisNZ
Tourmaline | Level 20

> With the UNDERLINE in the BREAK AFTER, is there a way to have it extend only for some columns, rather than across the entire report?

Are you using ODS Listing? No you can't as far as I know.

 

>  why I need to reset the cumulative totals when the _BREAK_ value changes even though I am using COMPUTE BEFORE.

Removing these 2 lines does not change the report.

jteres
Obsidian | Level 7
Thanks Chris. Removing those lines had no effect.
jteres
Obsidian | Level 7

I'm revisiting this...

 

As it turns out, removing the 2 lines referenced -do- change the report, and "breaks" the functionality I was working towards. So taking out these lines "breaks" the report:

 

if _break_ = "Origin" THEN cumfreq = . ;

if _break_ = "Origin" THEN total = 0 ;

 

These two lines are extraneous, and can be removed without causing any issues:

 

if _break_ = "_RBREAK_" then gcumfreq = . ;

if _break_ = "_RBREAK_" then gtotal = 0 ;

 

Also, I noticed that I get several instances of "division by zero" notes:

 

NOTE: Division by zero detected at line 1 column 21.

NOTE: Division by zero detected at line 1 column 19.

NOTE: Division by zero detected at line 1 column 25.

 

I looked up that issue, and in one post here it seemed like it was due to a referencing the wrong variable in the code.

 

Looking at the output data set generated from this report, it's clear that there are some rows when the totals aren't populated yet, and so there would be division by 0

 

When I add an IF condition to my calculations that use division, the NOTES go away, ie:

 

compute pct ;

     if not(missing(make)) then pct = Horsepower.n / total ;

endcomp ;

 

I thought maybe I could use a COMPUTE AFTER block but haven't figure out a way to make that work. At this risk of setting up a "Doctor, it hurts when I do this" "Then don't do that" exchange, I'm trying to figure out if there's a "better" or at least SAS-ier way of doing that.

 

And I still don't understand the counter reset code. Looking at the output dataset, it's clear there's a difference:

(with the manual resets)

Obs Origin Make Horsepower     pct cumfreq   cumpct  gtotpct gcumfreq  gcumpct  _BREAK_

18  Europe      123              .       .        .  0.28738      158  0.36916  Origin

19  Europe Audi  19        0.15447      19  0.15447  0.04439      177  0.41355

20  Europe BMW   20        0.16260      39  0.31707  0.04673      197  0.46028

 

(and without)

Obs Origin Make Horsepower     pct cumfreq   cumpct  gtotpct gcumfreq  gcumpct  _BREAK_

18  Europe      123              .     439  2.77848  0.28738      158  0.36916  Origin

19  Europe Audi 19        0.067616      19  0.06762  0.04439      177  0.41355

20  Europe BMW  20        0.071174      39  0.13879  0.04673      197  0.46028

 

so I'm still at a bit of a loss because I haven't seen anything that explains this level of detail. The denominator of "cumfreq" in the first section is 123, but in the second it's 280. Without resetting the counters, I don't get the correct value in this code.

ChrisNZ
Tourmaline | Level 20

1. The 2 lines to remove were the ones using _BREAK_, as stated.

2. Use the divide function to avoid the division by zero 

 

jteres
Obsidian | Level 7

As it turns out, none of the logical resets are necessary.

 

I am indebted to David D. Chapman as I was building on the code he presented in his paper on using PROC REPORT to generate cumulative, but the issue was arising from this code:

 

 ** before anything happens, initialize gtotal as the sum of Horsepower.n and gcumtot to 0 ;
     compute before ;
          gtotal + Horsepower.n ;
          gcumtot = 0 ;
     endcomp ;


** before each new origin value, re-initialize total and cumtot ;
     compute before origin ;
          total + Horsepower.n ;
          cumtot = 0 ;
     endcomp ;

Those sum statements aren't actually necessary. When using COMPUTE BEFORE, PROC REPORT can "look ahead" and calculate a group-specific total.

 

By replacing that code with this:

 ** before anything happens, initialize gtotal as the sum of Horsepower.n and gcumtot to 0 ;
     compute before ;
          gtotal = Horsepower.n ;
          gcumtot = 0 ;
     endcomp ;


** before each new origin value, re-initialize total and cumtot ;
     compute before origin ;
          total = Horsepower.n ;
          cumtot = 0 ;
     endcomp ;

the group totals are calculated and accessible to each row within the block for calculating percentages.

 

I did find that in order to properly accumulate the various percentages and frequencies that I had to sum only when the value of the innermost group variable was non-missing (i.e., I had to skip lines that were group totals etc). 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2439 views
  • 0 likes
  • 2 in conversation