I have 100 Variable in my dataset. I need to find n nmiss min max mean median for each variable..
When I tried writing proc means/summary with output statement it is giving only one record in out dataset(don't know which variable' summary stats is for). Is it not possible to get summary stats in to a dataset for all variables with one proc statement without using macros to pass each variable and appending it to the base table.
please help me on the same.
Use autoname :
Proc means data=myData;
var _numeric_;
outpur out=myStats n= nmiss= min= max= mean= median= / autoname;
should be:
output out=myStats n= nmiss= min= max= mean= median= / autoname;
Thanks for the help and it is giving it all columns info in a single row.. I need it by different rows for each variable.
If it's a table you want, then use proc tabulate :
proc tabulate data =myData format=best7.;
var _numeric_;
table (_numeric_),(n nmiss mean min max median);
Have you looked at the OUT= data from this TABULATE or even the ODS OUTPUT <don't know the name>=stats;
No I hadn't. But now that I do, I don't get the results that I expected.
PROC TABULATE seems better at producing report tables than datasets.
If you remove the OUT= then ODS OUTPUT will work. However they are both the same, mostly.
It seems that proc tabulate is not so smart. Use SASHELP.CLASS as an example.
%macro mean(tname,vname); Proc means data=&tname noprint; var &vname ; output out=temp(drop=_:) n=n nmiss=nmiss min=min max=max mean=mean median=median ; run; data temp;length vname $ 40;set temp;vname="&vname";run; proc append base=want data=temp force;run; %mend mean; data _null_; set sashelp.vcolumn(where=(libname='SASHELP' and memname='CLASS' and type='num')) ; call execute(cats('%mean(',libname,'.',memname,',',name,')')); run;
Looping over variables in a data set as you suggest is generally a bad idea. When the number of variables and obs is relatively small performance is not too bad but as either grow performance can suffer significantly.
In general it is better to let SAS summarize many variables at once. It is very good at that. However there is the problem of the output not in the format that is most pleasing. Of course that can be remedied more efficiently after the data are summarized as the number of obs to be process will be reduced.
proc summary data=sashelp.heart nway;
class sex;
var _numeric_;
output out=Stats0 n= nmiss= min= max= mean= median= / autoname;
proc transpose data=stats0 out=stats1;
by sex _type_ _freq_;
data stats1;
set stats1;
call scan(_name_,-1,p,l,'_');
length Variable $32 Statistic $8;
Variable = substr(_name_,1,p-2);
Statistic = substr(_name_,p);
proc sort data=stats1;
by Sex Variable;
proc transpose data=stats1 out=stats2(drop=_name_);
by Sex Variable;
var col1;
id Statistic;
idlabel Statistic;
Proc print;
No, I don't think your code is good one. Since you mentioned lots of variables in a table.
In your code, every variable will generate four new variables, when a table has lots of variables (i.e. twenty thousand ), you think proc means; _numeric_ ; can hold so many variables? , and more ,you are using proc transpose which is undisputed way to slow down . Therefore, I don't think your code is better or faster than mine.
