Hi all--
I need to produce a spreadsheet that produces summary stats for a big group of variables. So I need the Mean, Median, Max and Min with a line separating them from the rest of the data for each variable. I know how to use excelxp tagset and proc report to format the excel columns but I don’t know how to write the code to produce the additional summary stats at the bottom of the columns. Below is the example I’m looking for. There are more than three variables in the dataset.
Your assistance is much appreciated!
Agency | Q1 | Qa1 | Q2 |
Agency 1 | 98% | 98% | 96% |
Agency 2 | 100% | 99% | 100% |
Agency 3 | 94% | 91% | 93% |
Agency 4 | 95% | 95% | 95% |
Agency 5 | 96% | 94% | 96% |
Agency 6 | 99% | 98% | 100% |
Mean | 97% | 96% | 97% |
Median | 97% | 97% | 96% |
Max | 100% | 99% | 100% |
Min | 94% | 91% | 93% |
Hi:
What you want to do is not specific to TAGSETS.EXCELXP -- the technique you need/want is a PROC REPORT technique to produce extra summary rows. In order to write out multiple summary rows, such as you show, use multiple BREAK statements, as shown on pages 7 and 8 of this paper (requires that you set up some "dummy" or extra variables for break processing):
http://support.sas.com/rnd/papers/sgf07/sgf2007-report.pdf
cynthia
For me using then COMPUTE statement in PROC REPORT quickly reaches a point diminishing returns. You will need a COMPUTE with CALL define if you add a statistic like N.
data test;
input Agency &$10. (Q1 Qa1 Q2)(:percent.);
cards;
Agency 1 98% 98% 96%
Agency 2 100% 99% 100%
Agency 3 94% 91% 93%
Agency 4 95% 95% 95%
Agency 5 96% 94% 96%
Agency 6 99% 98% 100%
;;;;
run;
proc summary data=test;
var q:;
output out=stats;
output out=median median=;
run;
data display;
set test(in=in1)
stats(where=(_stat_ in('MEAN')))
median
stats(where=(_stat_ in('MIN' 'MAX')))
;
if in1 then type=1; else type=2;
agency = coalesceC(agency,propcase(_stat_),'Median');
output;
call missing(agency);
format q: percent.;
drop _:;
run;
proc print;
run;
Proc report nowd list headline;
columns type agency q:;
define type / order noprint;
define agency / display left;
define q: / display right;
break after type / skip;
run;
Hi:
For me, the return on using PROC REPORT is that I can get the extra lines (see screen shot) using the technique shown in the posted paper link, without making any extra passes through the data. The screenshot was produced with 1 pass thru the data -- no proc summary, no data step needed.
cynthia
data test;
input Agency &$10. (Q1 Qa1 Q2)(:percent.);
x1 = '1';
x2 = '2';
x3 = '3';
x4 = '4';
return;
datalines4;
Agency 1 98% 98% 96%
Agency 2 100% 99% 100%
Agency 3 94% 91% 93%
Agency 4 95% 95% 95%
Agency 5 96% 94% 96%
Agency 6 99% 98% 100%
;;;;
run;
ods listing close;
ods tagsets.excelxp file='c:\temp\extralines.xml' style=sasweb;
Proc report nowd list headline;
columns x1 x2 x3 x4 agency
q1 q1=q1median q1=q1max q1=q1min
qa1 qa1=qa1median qa1=qa1max qa1=qa1min
q2 q2=q2median q2=q2max q2=q2min;
define x1 /order noprint;
define x2 /order noprint;
define x3 / order noprint;
define x4 / order noprint;
define agency / order;
define q1 / mean f=percent8.2 'Q1';
define q1median / median 'Median' noprint;
define q1max / max 'Max' noprint;
define q1min / min 'Min' noprint;
define qa1 / mean f=percent8.2 'Qa1';
define qa1median / median 'Median' noprint;
define qa1max / max 'Max' noprint;
define qa1min / min 'Min' noprint;
define q2 / mean f=percent8.2 'Q2';
define q2median / median 'Median' noprint;
define q2max / max 'Max' noprint;
define q2min / min 'Min' noprint;
compute after x4;
agency = 'Mean';
endcomp;
compute after x3;
agency = 'Median';
q1.mean = q1median;
qa1.mean = qa1median;
q2.mean = q2median;
endcomp;
Compute after x2;
agency = 'Max';
q1.mean = q1max;
qa1.mean = qa1max;
q2.mean = q2max;
endcomp;
Compute after x1;
agency = 'Min';
q1.mean = q1min;
qa1.mean = qa1min;
q2.mean = q2min;
endcomp;
break after x4 / summarize;
break after x3 / summarize;
break after x2 / summarize;
break after x1 / summarize;
run;
ods tagsets.excelxp close;
I think you have made my case for diminishing returns. Remember the OP mentioned lots of Qs.
I was also thinking since the destination is EXCEL that perhaps "we" should just write a formula into the cells for the statistics.
hi ... sort of the data _null_ solution using PRINT (also not variable-specific) ...
* compute statistics
proc summary data=test;
var q: ;
output out=stats mean= median= min= max= / autoname;
run;
* rearrange statistics;
data stats;
set stats;
array q(*) q: ;
do j=1 to dim(q);
agency = scan(vname(q(j)),2,'_');
var = scan(vname(q(j)),1,'_');
x = q(j);
output;
end;
keep agency stat var x;
run;
proc transpose data=stats out=tstats (drop=_name_);
by agency notsorted;
id var;
var x;
run;
* add that blank line between original data and statistics;
data dummy;
set test (obs=1);
call missing (of _all_);
run;
proc append base=test data=dummy;
run;
proc append base=test data=tstats force;
run;
* "elemental" ... PRINT;
options missing=' ';
ods listing close;
ods results off;
ods html file='z:\agencies.html' style=barrettsblue;
proc print data=test noobs;
format q: percent.;
run;
ods html close;
ods results;
ods listing;
options missing='.';
I like Mike's and NULL's code if you have lots of variables needed to process.
Here is my code almost the same with Mike , Null .
data test; input Agency &$10. (Q1 Qa1 Q2)(:percent.); cards; Agency 1 98% 98% 96% Agency 2 100% 99% 100% Agency 3 94% 91% 93% Agency 4 95% 95% 95% Agency 5 96% 94% 96% Agency 6 99% 98% 100% ;;;; run; proc summary data=test; var q: ; output out=mean (drop=_:) mean= ; run; proc summary data=test; var q: ; output out=median(drop=_:) median= ; run; proc summary data=test; var q: ; output out=min(drop=_:) min= ; run; proc summary data=test; var q: ; output out=max(drop=_:) max= ; run; data want; set test mean median min max indsname=dsn; if missing(agency) then agency=scan(dsn,-1,'.'); run; ods listing close; ods html file='c:\agencies.html' style=sasweb; proc print data=want noobs; run; ods html close; ods listing
Ksharp
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.