BookmarkSubscribeRSS Feed
data_null__
Jade | Level 19

Ksharp wrote:

NULL,

No, I don't think your code is good one. Since you mentioned lots of variables in a table.

In your code, every variable will generate four new variables, when a table has lots of variables (i.e. twenty thousand ), you think proc means; _numeric_ ; can hold so many variables? , and more ,you are using proc transpose which is undisputed way to slow down . Therefore, I don't think your code is better or faster than mine.

Ksharp

I see you don't believe me as you seem to have preconceived notions of what kinds of SAS programs perform better than others.

I tested my program and found that it will summarize 20,000 variables although I think that is very close to the limit of my laptop computer's capacity.  The program was able to summarize all variables and reformat the output into the “stats in columns variables in rows” format.  It took about 34 seconds which seems pretty good.  Using the loopy method took a little over 1 minute for 1000 variables I did not try 20,000 variables but I suppose that would be a least 20 minutes. 

898      run;

899   options fullstimer=1;

900

901   data manyVars;

902      array v[20000];

903      do _n_ = 1 to 200;

904         do i = 1 to dim(v);

905            v=ranuni(12345);

906            end;

907         output;

908         end;

909      drop i;

910      run;

NOTE: The data set WORK.MANYVARS has 200 observations and 20000 variables.

NOTE: DATA statement used (Total process time):

      real time           0.54 seconds

      user cpu time       0.39 seconds

      system cpu time     0.03 seconds

      Memory                            9322k

      OS Memory                         18732k

      Timestamp            6/1/2012  8:42:01 PM

911

912

913   %let start = %sysfunc(datetime());

914   proc summary data=manyVars nway;

915      var _numeric_;

916      output out=Stats0 n= nmiss= min= max= mean= median= / autoname;

917      run;

NOTE: There were 200 observations read from the data set WORK.MANYVARS.

NOTE: The data set WORK.STATS0 has 1 observations and 120002 variables.

NOTE: PROCEDURE SUMMARY used (Total process time):

      real time           32.24 seconds

      user cpu time       9.45 seconds

      system cpu time     10.84 seconds

      Memory                            324587k

      OS Memory                         340148k

      Timestamp            6/1/2012  8:42:33 PM

918   proc transpose data=stats0 out=stats1;

919      by _type_ _freq_;

920      run;

NOTE: There were 1 observations read from the data set WORK.STATS0.

NOTE: The data set WORK.STATS1 has 120000 observations and 4 variables.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

      real time           0.35 seconds

      user cpu time       0.18 seconds

      system cpu time     0.09 seconds

      Memory                            42809k

      OS Memory                         53092k

      Timestamp            6/1/2012  8:42:34 PM

921   data stats1;

922      set stats1;

923      call scan(_name_,-1,p,l,'_');

924      length Variable $32 Statistic $8;

925      Variable  = substr(_name_,1,p-2);

926      Statistic = substr(_name_,p);

927      run;

NOTE: There were 120000 observations read from the data set WORK.STATS1.

NOTE: The data set WORK.STATS1 has 120000 observations and 8 variables.

NOTE: DATA statement used (Total process time):

      real time           0.10 seconds

      user cpu time       0.04 seconds

      system cpu time     0.03 seconds

      Memory                            209k

      OS Memory                         13424k

      Timestamp            6/1/2012  8:42:34 PM

928   proc sort data=stats1 sortseq=LINGUISTIC(NUMERIC_COLLATION=ON);

929      by Variable;

930      run;

NOTE: There were 120000 observations read from the data set WORK.STATS1.

NOTE: The data set WORK.STATS1 has 120000 observations and 8 variables.

NOTE: PROCEDURE SORT used (Total process time):

      real time           0.79 seconds

      user cpu time       0.24 seconds

      system cpu time     0.14 seconds

      Memory                            61002k

      OS Memory                         73628k

      Timestamp            6/1/2012  8:42:35 PM

931   proc transpose data=stats1 out=stats2(drop=_name_);

932      by Variable;

933      var col1;

934      id Statistic;

935      idlabel Statistic;

936      run;

NOTE: There were 120000 observations read from the data set WORK.STATS1.

NOTE: The data set WORK.STATS2 has 20000 observations and 7 variables.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

      real time           0.32 seconds

      user cpu time       0.31 seconds

      system cpu time     0.01 seconds

      Memory                            2197k

      OS Memory                         15476k

      Timestamp            6/1/2012  8:42:35 PM

937   %let end = %sysfunc(datetime());

938   %put NOTE: START=&start END=&end;

NOTE: START=1654202521.463 END=1654202555.409

939   %put NOTE: ET = %sysfunc(sum(-&start,&end),time12.);

NOTE: ET =      0:00:34

Ksharp
Super User

NULL,

I know what your code means, I still don't think your code is a good one.

You force me to recode it for your intention. The only think I need is the number of numeric variables in this dataset.

Check it out. I think it is better and faster than yours.

data heart;set sashelp.heart;run;





proc summary data=heart nway;
   class sex;
   var _numeric_;
   output out=Stats0(drop=_:) n= nmiss= min= max= mean= median= / autoname;
run;
proc sql noprint;
 select num_numeric into : n
  from dictionary.tables
   where libname='WORK' and memname='HEART';
quit;
data want;
 set Stats0;
 length vname $ 40 ;
 array _a{*} _numeric_ ;
 do i=1 to &n ;
  vname=scan(vname(_a{i}),1,'_');
  n=_a{i} ;
  nmiss=_a{i+&n } ;
  min=_a{i+2*&n } ;
  max=_a{i+3*&n } ;
  mean=_a{i+4*&n } ;
  median=_a{i+5*&n } ;
  output;
 end;
 keep sex vname n nmiss min max mean median ;
run;



  

Ksharp

data_null__
Jade | Level 19

I was comparing your loopy program to the more logical approach of summarizing all the variables in one procedure.  You appear to understand the importance of that now, as you have abandoned that approach and have chosen to focus on PROC TRANSPOSE.

The transpose of the very wide data from PROC SUMMARY is not the limiting factor in this problem.  You claim your data step transpose is faster than PROC TRANSPOSE may be true but did you test it and how much slower would it need to be to make a difference. 

I would argue that any performance gain that might (as yet unproven) be had with the data step transpose is outweighed by the inflexibility of data step you wrote.  What happens when you add a statistic?  You have to rewrite the data step transpose.

data_null__
Jade | Level 19

Did you ever fix the problem with 2-sided confidence intervals.

M_Maldonado
Barite | Level 11

The simplest code for us lazy people would be Proc Univariate.

proc univariate data=mydataset outtable=summarystats

(keep= _var_ _nobs_ _nmiss_ _min_ _max_ _mean_ _median_)

noprint;

var var1-var100; *you can also omit this statement if you want it to obtain the summary statistics for all your variables;

run;

The bonus is the labels... they are awesome!

Kind regards,

Miguel

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 2365 views
  • 3 likes
  • 7 in conversation