Re: Summary stats at the bottom of excel columns

Rick79 · Posted 04-20-2012 02:55 PM

Hi all--

I need to produce a spreadsheet that produces summary stats for a big group of variables. So I need the Mean, Median, Max and Min with a line separating them from the rest of the data for each variable. I know how to use excelxp tagset and proc report to format the excel columns but I don’t know how to write the code to produce the additional summary stats at the bottom of the columns. Below is the example I’m looking for. There are more than three variables in the dataset.

Your assistance is much appreciated!

Agency	Q1	Qa1	Q2
Agency 1	98%	98%	96%
Agency 2	100%	99%	100%
Agency 3	94%	91%	93%
Agency 4	95%	95%	95%
Agency 5	96%	94%	96%
Agency 6	99%	98%	100%

Mean	97%	96%	97%
Median	97%	97%	96%
Max	100%	99%	100%
Min	94%	91%	93%

Cynthia_sas · Posted 04-20-2012 11:00 PM

Hi:

What you want to do is not specific to TAGSETS.EXCELXP -- the technique you need/want is a PROC REPORT technique to produce extra summary rows. In order to write out multiple summary rows, such as you show, use multiple BREAK statements, as shown on pages 7 and 8 of this paper (requires that you set up some "dummy" or extra variables for break processing):

http://support.sas.com/rnd/papers/sgf07/sgf2007-report.pdf

cynthia

data_null__ · Posted 04-21-2012 09:04 AM

For me using then COMPUTE statement in PROC REPORT quickly reaches a point diminishing returns. You will need a COMPUTE with CALL define if you add a statistic like N.

data test;

input Agency &$10. (Q1 Qa1 Q2)(:percent.);

cards;

Agency 1 98% 98% 96%

Agency 2 100% 99% 100%

Agency 3 94% 91% 93%

Agency 4 95% 95% 95%

Agency 5 96% 94% 96%

Agency 6 99% 98% 100%

;;;;

run;

proc summary data=test;

var q:;

output out=stats;

output out=median median=;

run;

data display;

set test(in=in1)

stats(where=(_stat_ in('MEAN')))

median

stats(where=(_stat_ in('MIN' 'MAX')))

;

if in1 then type=1; else type=2;

agency = coalesceC(agency,propcase(_stat_),'Median');

output;

call missing(agency);

format q: percent.;

drop _:;

run;

proc print;

run;

Proc report nowd list headline;

columns type agency q:;

define type / order noprint;

define agency / display left;

define q: / display right;

break after type / skip;

run;

Cynthia_sas · Posted 04-21-2012 10:16 AM

Hi:

For me, the return on using PROC REPORT is that I can get the extra lines (see screen shot) using the technique shown in the posted paper link, without making any extra passes through the data. The screenshot was produced with 1 pass thru the data -- no proc summary, no data step needed.

cynthia

data test;

input Agency &$10. (Q1 Qa1 Q2)(:percent.);

x1 = '1';

x2 = '2';

x3 = '3';

x4 = '4';

return;

datalines4;

Agency 1 98% 98% 96%

Agency 2 100% 99% 100%

Agency 3 94% 91% 93%

Agency 4 95% 95% 95%

Agency 5 96% 94% 96%

Agency 6 99% 98% 100%

;;;;

run;

ods listing close;

ods tagsets.excelxp file='c:\temp\extralines.xml' style=sasweb;

Proc report nowd list headline;

columns x1 x2 x3 x4 agency

q1 q1=q1median q1=q1max q1=q1min

qa1 qa1=qa1median qa1=qa1max qa1=qa1min

q2 q2=q2median q2=q2max q2=q2min;

define x1 /order noprint;

define x2 /order noprint;

define x3 / order noprint;

define x4 / order noprint;

define agency / order;

define q1 / mean f=percent8.2 'Q1';

define q1median / median 'Median' noprint;

define q1max / max 'Max' noprint;

define q1min / min 'Min' noprint;

define qa1 / mean f=percent8.2 'Qa1';

define qa1median / median 'Median' noprint;

define qa1max / max 'Max' noprint;

define qa1min / min 'Min' noprint;

define q2 / mean f=percent8.2 'Q2';

define q2median / median 'Median' noprint;

define q2max / max 'Max' noprint;

define q2min / min 'Min' noprint;

compute after x4;

agency = 'Mean';

endcomp;

compute after x3;

agency = 'Median';

q1.mean = q1median;

qa1.mean = qa1median;

q2.mean = q2median;

endcomp;

Compute after x2;

agency = 'Max';

q1.mean = q1max;

qa1.mean = qa1max;

q2.mean = q2max;

endcomp;

Compute after x1;

agency = 'Min';

q1.mean = q1min;

qa1.mean = qa1min;

q2.mean = q2min;

endcomp;

break after x4 / summarize;

break after x3 / summarize;

break after x2 / summarize;

break after x1 / summarize;

run;

ods tagsets.excelxp close;

data_null__ · Posted 04-21-2012 10:47 AM

I think you have made my case for diminishing returns. Remember the OP mentioned lots of Qs.

I was also thinking since the destination is EXCEL that perhaps "we" should just write a formula into the cells for the statistics.

MikeZdeb · Posted 04-21-2012 12:34 PM

hi ... sort of the data _null_ solution using PRINT (also not variable-specific) ...

* compute statistics

proc summary data=test;

var q: ;

output out=stats mean= median= min= max= / autoname;

run;

* rearrange statistics;

data stats;

set stats;

array q(*) q: ;

do j=1 to dim(q);

agency = scan(vname(q(j)),2,'_');

var = scan(vname(q(j)),1,'_');

x = q(j);

output;

end;

keep agency stat var x;

run;

proc transpose data=stats out=tstats (drop=_name_);

by agency notsorted;

id var;

var x;

run;

* add that blank line between original data and statistics;

data dummy;

set test (obs=1);

call missing (of _all_);

run;

proc append base=test data=dummy;

run;

proc append base=test data=tstats force;

run;

* "elemental" ... PRINT;

options missing=' ';

ods listing close;

ods results off;

ods html file='z:\agencies.html' style=barrettsblue;

proc print data=test noobs;

format q: percent.;

run;

ods html close;

ods results;

ods listing;

options missing='.';

Ksharp · Posted 04-23-2012 03:39 AM

I like Mike's and NULL's code if you have lots of variables needed to process.

Here is my code almost the same with Mike , Null .

data test;
   input Agency &$10. (Q1 Qa1 Q2)(:percent.);
   cards;
Agency 1   98% 98% 96%
Agency 2  100% 99% 100%
Agency 3   94% 91% 93%
Agency 4  95% 95% 95%
Agency 5   96% 94% 96%
Agency 6  99% 98% 100%
;;;;
   run;
proc summary data=test;
var q: ;
output out=mean (drop=_:) mean= ;
run;
proc summary data=test;
var q: ;
output out=median(drop=_:) median= ;
run;
proc summary data=test;
var q: ;
output out=min(drop=_:) min= ;
run;
proc summary data=test;
var q: ;
output out=max(drop=_:) max= ;
run;
data want;
 set test mean median min max indsname=dsn;
 if missing(agency) then agency=scan(dsn,-1,'.');
run;
ods listing close;
ods html file='c:\agencies.html' style=sasweb;
proc print data=want noobs;
run;
ods html close;
ods listing

Ksharp

Summary stats at the bottom of excel columns