BookmarkSubscribeRSS Feed
data_null__
Jade | Level 19

I was trying to see how many 2 way tables I could get out of PROC SUMMARY and I noticed a performance issue with multi-lable formats and the MLF option.  I think my test must be flawed but I can't figure a way to get the same performance without MLF even if the formats don't actually have multi-lables but just use the option as in my example below.


17         data sample;
18            array d[1500];
19            do id = 1 to 100;
20               trt = rantbl(8,.3,.3);
21               do i = 1 to dim(d);
22                  d = rantbl(8,.1,.05,.3,.4);
23                  end;
24               output;
25               end;
26            run;

NOTE:
The data set WORK.SAMPLE has 100 observations and 1503 variables.
NOTE: DATA statement used (Total process time):
      real time          
0.03 seconds
      user cpu time      
0.02 seconds
      system cpu time    
0.01 seconds
     

27         proc format;
28            value dgrp(notsorted) 2='Group A' 1='Group C' 5='Group D' 4='Group E' 3='Group B';
29            value trt(notsorted)  1='Placebo' 2='Active 1' 3='Active 2';
29       !                                                                ;
2                                                          The SAS System                             08:41 Monday, January 13, 2014

30            run;
NOTE: PROCEDURE FORMAT used (Total process time):
      real time          
0.00 seconds
      user cpu time      
0.00 seconds
      system cpu time    
0.00 seconds
     

31         proc summary data=sample chartype completetypes missing;
32            class d: trt / preloadfmt order=data;
33            format d: dgrp. trt trt.;
34            types trt (d:)*trt;
35            output out=_null_;* / levels ways;
36            run;

NOTE:
Multiple concurrent threads will be used to summarize data.
NOTE: There were
100 observations read from the data set WORK.SAMPLE.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time          
30.20 seconds
      user cpu time      
29.84 seconds
      system cpu time    
0.36 seconds
      memory             
186330.71k
      OS Memory          
207000.00k
      Timestamp          
01/13/2014 08:58:08 AM
      Page Faults                      
1
      Page Reclaims                    
0
      Page Swaps                       
0
      Voluntary Context Switches       
136
      Involuntary Context Switches     
234
      Block Input Operations           
1
      Block Output Operations          
0
     

37        
38         proc format;
39            value dgrp(notsorted multilabel) 2='Group A' 1='Group C' 5='Group D' 4='Group E' 3='Group B';
40            value trt(notsorted multilabel)  1='Placebo' 2='Active 1' 3='Active 2' /*1,2,3='Total'*/;
41            run;

NOTE:
PROCEDURE FORMAT used (Total process time):
      real time          
0.00 seconds
      user cpu time      
0.00 seconds
      system cpu time    
0.00 seconds
     

42         proc summary data=sample chartype completetypes missing;
43            class d: trt / preloadfmt mlf order=data;
44            format d: dgrp. trt trt.;
45            types trt (d:)*trt;
46            output out=_null_;* / levels ways;
47            run;

NOTE:
Multiple concurrent threads will be used to summarize data.
NOTE: There were
100 observations read from the data set WORK.SAMPLE.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time          
3.42 seconds
      user cpu time      
4.14 seconds
      system cpu time    
0.50 seconds
      memory             
257941.45k
      OS Memory          
278520.00k
      Timestamp          
01/13/2014 08:58:11 AM
      Page Faults                      
0
      Page Reclaims                    
0
      Page Swaps                       
0
      Voluntary Context Switches       
157
      Involuntary Context Switches     
43
      Block Input Operations           
0
      Block Output Operations          
0
     

5 REPLIES 5
data_null__
Jade | Level 19

I was thinking that someone might comment.  I guess I should have made it a question.

Tom
Super User Tom
Super User

What is the question?  The second summary appears to use less time and more memory.  More memory might be attributed to the MLF option. Less time might just be disk caching of you input data set.

data_null__
Jade | Level 19

It's not disk caching.

As best I can tell the performance different is related to the use of MLF.  I just think it is interesting and wondered if anyone had noticed it before or maybe someone from SAS had an explanation.    It does take a lot of variables before the performance different becomes noticeable.

Cynthia_sas
SAS Super FREQ


Hi:

  I didn't comment because you're not using MULTILABEL and MLF the way they are intended. Your user-defined format does not have any "overlapping" values...so I'm not sure what your test shows. The usual way to specify the MLF would be if you had 5 years in the data (1998 - 2002) and you wanted the year categories to be every year "by itself" and then the 1998-1999 as one category, and the 2000-2002 years as a separate category. So essentially, you are "double counting" each year.:

1998

1999

2000

2001

2002

1998 and 1999

2000 through 2002

  You might have the same results, you might not. As we say in the Advanced Programming class -- Your Mileage May Vary.

cynthia

data_null__
Jade | Level 19

I removed the "true" MULTILABEL from my example to make the two steps equivalent in the ouput they produce, same number of observations from the same number of variables and crossings.

PROC FORMAT and PROC SUMMARY don't seem to care if you used MULTILABEL as it was intended or not.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1970 views
  • 0 likes
  • 3 in conversation