Ksharp wrote:
NULL,
No, I don't think your code is good one. Since you mentioned lots of variables in a table.
In your code, every variable will generate four new variables, when a table has lots of variables (i.e. twenty thousand ), you think proc means; _numeric_ ; can hold so many variables? , and more ,you are using proc transpose which is undisputed way to slow down . Therefore, I don't think your code is better or faster than mine.
Ksharp
I see you don't believe me as you seem to have preconceived notions of what kinds of SAS programs perform better than others.
I tested my program and found that it will summarize 20,000 variables although I think that is very close to the limit of my laptop computer's capacity. The program was able to summarize all variables and reformat the output into the “stats in columns variables in rows” format. It took about 34 seconds which seems pretty good. Using the loopy method took a little over 1 minute for 1000 variables I did not try 20,000 variables but I suppose that would be a least 20 minutes.
898 run;
899 options fullstimer=1;
900
901 data manyVars;
902 array v[20000];
903 do _n_ = 1 to 200;
904 do i = 1 to dim(v);
905 v=ranuni(12345);
906 end;
907 output;
908 end;
909 drop i;
910 run;
NOTE: The data set WORK.MANYVARS has 200 observations and 20000 variables.
NOTE: DATA statement used (Total process time):
real time 0.54 seconds
user cpu time 0.39 seconds
system cpu time 0.03 seconds
Memory 9322k
OS Memory 18732k
Timestamp 6/1/2012 8:42:01 PM
911
912
913 %let start = %sysfunc(datetime());
914 proc summary data=manyVars nway;
915 var _numeric_;
916 output out=Stats0 n= nmiss= min= max= mean= median= / autoname;
917 run;
NOTE: There were 200 observations read from the data set WORK.MANYVARS.
NOTE: The data set WORK.STATS0 has 1 observations and 120002 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
real time 32.24 seconds
user cpu time 9.45 seconds
system cpu time 10.84 seconds
Memory 324587k
OS Memory 340148k
Timestamp 6/1/2012 8:42:33 PM
918 proc transpose data=stats0 out=stats1;
919 by _type_ _freq_;
920 run;
NOTE: There were 1 observations read from the data set WORK.STATS0.
NOTE: The data set WORK.STATS1 has 120000 observations and 4 variables.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 0.35 seconds
user cpu time 0.18 seconds
system cpu time 0.09 seconds
Memory 42809k
OS Memory 53092k
Timestamp 6/1/2012 8:42:34 PM
921 data stats1;
922 set stats1;
923 call scan(_name_,-1,p,l,'_');
924 length Variable $32 Statistic $8;
925 Variable = substr(_name_,1,p-2);
926 Statistic = substr(_name_,p);
927 run;
NOTE: There were 120000 observations read from the data set WORK.STATS1.
NOTE: The data set WORK.STATS1 has 120000 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 0.10 seconds
user cpu time 0.04 seconds
system cpu time 0.03 seconds
Memory 209k
OS Memory 13424k
Timestamp 6/1/2012 8:42:34 PM
928 proc sort data=stats1 sortseq=LINGUISTIC(NUMERIC_COLLATION=ON);
929 by Variable;
930 run;
NOTE: There were 120000 observations read from the data set WORK.STATS1.
NOTE: The data set WORK.STATS1 has 120000 observations and 8 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.79 seconds
user cpu time 0.24 seconds
system cpu time 0.14 seconds
Memory 61002k
OS Memory 73628k
Timestamp 6/1/2012 8:42:35 PM
931 proc transpose data=stats1 out=stats2(drop=_name_);
932 by Variable;
933 var col1;
934 id Statistic;
935 idlabel Statistic;
936 run;
NOTE: There were 120000 observations read from the data set WORK.STATS1.
NOTE: The data set WORK.STATS2 has 20000 observations and 7 variables.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 0.32 seconds
user cpu time 0.31 seconds
system cpu time 0.01 seconds
Memory 2197k
OS Memory 15476k
Timestamp 6/1/2012 8:42:35 PM
937 %let end = %sysfunc(datetime());
938 %put NOTE: START=&start END=&end;
NOTE: START=1654202521.463 END=1654202555.409
939 %put NOTE: ET = %sysfunc(sum(-&start,&end),time12.);
NOTE: ET = 0:00:34
NULL,
I know what your code means, I still don't think your code is a good one.
You force me to recode it for your intention. The only think I need is the number of numeric variables in this dataset.
Check it out. I think it is better and faster than yours.
data heart;set sashelp.heart;run; proc summary data=heart nway; class sex; var _numeric_; output out=Stats0(drop=_:) n= nmiss= min= max= mean= median= / autoname; run; proc sql noprint; select num_numeric into : n from dictionary.tables where libname='WORK' and memname='HEART'; quit; data want; set Stats0; length vname $ 40 ; array _a{*} _numeric_ ; do i=1 to &n ; vname=scan(vname(_a{i}),1,'_'); n=_a{i} ; nmiss=_a{i+&n } ; min=_a{i+2*&n } ; max=_a{i+3*&n } ; mean=_a{i+4*&n } ; median=_a{i+5*&n } ; output; end; keep sex vname n nmiss min max mean median ; run;
Ksharp
I was comparing your loopy program to the more logical approach of summarizing all the variables in one procedure. You appear to understand the importance of that now, as you have abandoned that approach and have chosen to focus on PROC TRANSPOSE.
The transpose of the very wide data from PROC SUMMARY is not the limiting factor in this problem. You claim your data step transpose is faster than PROC TRANSPOSE may be true but did you test it and how much slower would it need to be to make a difference.
I would argue that any performance gain that might (as yet unproven) be had with the data step transpose is outweighed by the inflexibility of data step you wrote. What happens when you add a statistic? You have to rewrite the data step transpose.
Did you ever fix the problem with 2-sided confidence intervals.
The simplest code for us lazy people would be Proc Univariate.
proc univariate data=mydataset outtable=summarystats
(keep= _var_ _nobs_ _nmiss_ _min_ _max_ _mean_ _median_)
noprint;
var var1-var100; *you can also omit this statement if you want it to obtain the summary statistics for all your variables;
run;
The bonus is the labels... they are awesome!
Kind regards,
Miguel
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.