BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hi All,

 

I'm making a boxplot with a character x-axis, on a fairly large dataset (3M records).  When I added an AXISTABLE to the plot (to show the mean for each group), it slowed down dramatically.  

 

In testing, it looks like a plot with a numeric x-axis runs fine (with or without an axistable).  With a character x-axis and no axistable it runs fine.  But as soon as I add an axistable to the character x-axis, it slows down dramatically.

 

Sample code:

data have ;
  do cat=1 to 5 ;
    catc=put(cat,1.) ;
    do i=1 to 100000 ;
      score=ranuni(0)*cat ;
      output ;
    end ;
  end ;
run ;

options stimer ;

ods listing close ;
ods pdf file="%sysfunc(pathname(work))/mypdf.pdf" ;

*numeric category ;
proc sgplot data=have ;
  vbox score/category=cat extreme;
run ;
        
proc sgplot data=have ;
  vbox score/category=cat extreme;
  xaxistable score /location=inside stat=mean ;
run ;

*character category ;
proc sgplot data=have ;
  vbox score/category=catc extreme;
run ;
        
proc sgplot data=have ;
  vbox score/category=catc extreme;
  xaxistable score /location=inside stat=mean ;
run ;

ods pdf close ;

My log (PC SAS, 9.4M4):

14   ods pdf file="%sysfunc(pathname(work))/mypdf.pdf" ;
NOTE: Writing ODS PDF output to DISK destination
      "C:\Users\Quentin\AppData\Local\Temp\SAS Temporary Files\_TD1828_MD1QCFVC_\mypdf.pdf", printer "PDF".
15
16   *numeric category ;
17   proc sgplot data=have ;
18     vbox score/category=cat extreme;
19   run ;

NOTE: Since no format is assigned, the numeric category variable will use the default of BEST6.
NOTE: PROCEDURE SGPLOT used (Total process time):
      real time           3.59 seconds
      cpu time            2.93 seconds

NOTE: Compressing data set WORK._DOCTMP000000000000000000058 increased size by 54.03 percent.
      Compressed is 191 pages; un-compressed would require 124 pages.
NOTE: Compressing data set WORK._DOCTMP000000000000000000059 decreased size by 27.45 percent.
      Compressed is 267 pages; un-compressed would require 368 pages.
NOTE: There were 500000 observations read from the data set WORK.HAVE.

20
21   proc sgplot data=have ;
22     vbox score/category=cat extreme;
23     xaxistable score /location=inside stat=mean ;
24   run ;

NOTE: Since no format is assigned, the numeric category variable will use the default of BEST6.
NOTE: PROCEDURE SGPLOT used (Total process time):
      real time           2.58 seconds
      cpu time            1.79 seconds

NOTE: Compressing data set WORK._DOCTMP000000000000000000060 increased size by 54.03 percent.
      Compressed is 191 pages; un-compressed would require 124 pages.
NOTE: Compressing data set WORK._DOCTMP000000000000000000061 decreased size by 27.45 percent.
      Compressed is 267 pages; un-compressed would require 368 pages.
NOTE: Marker and line antialiasing has been disabled for at least one plot because the threshold has been reached. You can set
      ANTIALIASMAX=500000 in the ODS GRAPHICS statement to enable antialiasing for all plots.
NOTE: There were 500000 observations read from the data set WORK.HAVE.

25
26   *character category ;
27   proc sgplot data=have ;
28     vbox score/category=catc extreme;
29   run ;

NOTE: PROCEDURE SGPLOT used (Total process time):
      real time           1.79 seconds
      cpu time            1.56 seconds

NOTE: Compressing data set WORK._DOCTMP000000000000000000063 decreased size by 12.60 percent.
      Compressed is 215 pages; un-compressed would require 246 pages.
NOTE: There were 500000 observations read from the data set WORK.HAVE.

30
31   proc sgplot data=have ;
32     vbox score/category=catc extreme;
33     xaxistable score /location=inside stat=mean ;
34   run ;

NOTE: PROCEDURE SGPLOT used (Total process time):
      real time           59.75 seconds
      cpu time            25.57 seconds

NOTE: Compressing data set WORK._DOCTMP000000000000000000065 decreased size by 12.60 percent.
      Compressed is 215 pages; un-compressed would require 246 pages.
NOTE: Marker and line antialiasing has been disabled for at least one plot because the threshold has been reached. You can set
      ANTIALIASMAX=500000 in the ODS GRAPHICS statement to enable antialiasing for all plots.
NOTE: There were 500000 observations read from the data set WORK.HAVE.

35
36   ods pdf close ;

So each of the first 3 plots runs in three or four seconds.  In the 4th plot where I have character x-axis and add the axistable, it takes almost a minute.

 

If you're brave, bump the sample size up to 1,000,000 per group.  The plots go from ~30 seconds per plot 1-3, to 11 minutes for plot 4.  And I initially got java VM errors, so had to increase the memory per http://support.sas.com/kb/31/184.html  just to get it to run.  

 

So my questions:

1. Why would adding an XAXISTABLE slow down generation of the graph so dramatically (when the x-axis is a character var)?  [In my head, SAS only has to run PROC MEANS in the background to compute the means]

2. For those with a more current version, do you see the same performance hit on the 4th plot?

3. Why would adding an XAXISTABLE trigger the marker and line disabled note?  I understand that message when I have a complex plot with lots of shapes, but that doesn't apply here.

 

Should be easy enough for me to switch to using a numeric variable with a format attached on the x-axis, instead of a character variable, to avoid the tremendous performance hit.  But curious if there is an good explanation for this, and if the problem exists in 9.4M5/6?

 

Thanks,

-Q.

Check out the Boston Area SAS Users Group (BASUG) video archives: https://www.basug.org/videos.
1 ACCEPTED SOLUTION

Accepted Solutions
DanH_sas
SAS Super FREQ

Hey @Quentin , thanks for bringing this to our attention. For your use case, I believe you will find it faster to use the DISPLAYSTATS option on the VBOX statement. Give that a try and see if that works well for you.

 

View solution in original post

6 REPLIES 6
Ksharp
Super User

Why not calculate its mean firstly , and then using proc sgplot ? 

Quentin
PROC Star

@Ksharp wrote:

Why not calculate its mean firstly , and then using proc sgplot ? 


I could definitely do that, but I like axistable, and after making a graph, thought "hey, maybe I should show the mean/min/max values."  When I added them, I was surprised that they slowed the program so much (I was making five or six graphs, the program went from running in less than a minute to > 10 minutes).  Then when I tried to build a test case to post, I couldn't replicate the problem at first.  And I was more surprised when I realized it was because my test case had a numeric variable, when I changed to character it slowed dramatically.  (You always learn from making test cases).

 

I'm a huge fan of ODS graphics.  But my one complaint  would be that they seem slow to generate, especially with moderate to large data. Sometimes with box plots, I resort to calculating the values for the boxplot myself (using PROC MEANS or whatever), and then use GTL BOXPLOTPARM, which typically speeds things up, because there is less data to crunch.

 

Given that with bigger data I was getting java virtual machine memory errors, it makes me wonder if the JVM is doing statistical calculations and that is the cause of poor speed.  In my head, I would have thought that when I make a box plot in SAS, it would use SAS to calculate the statistics needed.  But it seems like often the statistical calculations are slower than you would get from a SAS PROC.  

 

PROC MEANS can calculate the means in < 0.2 seconds, regardless of whether the class variables is numeric or character:

 

12   proc means data=have mean;
13     var score ;
14     class cat ;
15   run ;

NOTE: There were 500000 observations read from the data set WORK.HAVE.
NOTE: PROCEDURE MEANS used (Total process time):
      real time           0.13 seconds
      cpu time            0.12 seconds


16
17
18   proc means data=have mean;
19     var score ;
20     class catc ;
21   run ;

NOTE: There were 500000 observations read from the data set WORK.HAVE.
NOTE: PROCEDURE MEANS used (Total process time):
      real time           0.06 seconds
      cpu time            0.15 seconds

 

Seems fair to be surprised that SGPLOT would take 20-50 seconds to calculate the same values. 

 

 

Check out the Boston Area SAS Users Group (BASUG) video archives: https://www.basug.org/videos.
Ksharp
Super User

Yeah. That is what sas usually do to surprised us . 

Sometimes I will also get something like yours .

Reeza
Super User

Just ran in M6 and it's similar but not as bad. 

 

 
 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 NOTE: ODS statements in the SAS Studio environment may disable some output features.
 69         
 70         data have ;
 71           do cat=1 to 5 ;
 72             catc=put(cat,1.) ;
 73             do i=1 to 100000 ;
 74               score=ranuni(0)*cat ;
 75               output ;
 76             end ;
 77           end ;
 78         run ;
 
 NOTE: The data set WORK.HAVE has 500000 observations and 4 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.20 seconds
       cpu time            0.20 seconds
       
 
 79         
 80         options stimer ;
 81         
 82         ods listing close ;
 83         ods pdf file="%sysfunc(pathname(work))/mypdf.pdf" ;
 NOTE: Writing ODS PDF output to DISK destination 
       "/tmp/SAS_workAE9500004D83_localhost.localdomain/SAS_work5F1F00004D83_localhost.localdomain/mypdf.pdf", printer "PDF".
 84         
 85         *numeric category ;
 86         proc sgplot data=have ;
 87           vbox score/category=cat extreme;
 88         run ;
 
 NOTE: Since no format is assigned, the numeric category variable will use the default of BEST6.
 NOTE: PROCEDURE SGPLOT used (Total process time):
       real time           7.47 seconds
       cpu time            1.81 seconds
       
 NOTE: There were 500000 observations read from the data set WORK.HAVE.
 
 89         
 90         proc sgplot data=have ;
 91           vbox score/category=cat extreme;
 92           xaxistable score /location=inside stat=mean ;
 93         run ;
 
 NOTE: Since no format is assigned, the numeric category variable will use the default of BEST6.
 NOTE: PROCEDURE SGPLOT used (Total process time):
       real time           4.07 seconds
       cpu time            1.19 seconds
       
 NOTE: Marker and line antialiasing has been disabled for at least one plot because the threshold has been reached. You can set 
       ANTIALIASMAX=500000 in the ODS GRAPHICS statement to enable antialiasing for all plots.
 NOTE: There were 500000 observations read from the data set WORK.HAVE.
 
 94         
 95         *character category ;
 96         proc sgplot data=have ;
 97           vbox score/category=catc extreme;
 98         run ;
 
 NOTE: PROCEDURE SGPLOT used (Total process time):
       real time           1.18 seconds
       cpu time            1.06 seconds
       
 NOTE: There were 500000 observations read from the data set WORK.HAVE.
 
 99         
 100        proc sgplot data=have ;
 101          vbox score/category=catc extreme;
 102          xaxistable score /location=inside stat=mean ;
 103        run ;
 
 NOTE: PROCEDURE SGPLOT used (Total process time):
       real time           31.64 seconds
       cpu time            19.39 seconds
       
 NOTE: Marker and line antialiasing has been disabled for at least one plot because the threshold has been reached. You can set 
       ANTIALIASMAX=500000 in the ODS GRAPHICS statement to enable antialiasing for all plots.
 NOTE: There were 500000 observations read from the data set WORK.HAVE.
 
 104        
 105        ods pdf close ;
 NOTE: ODS PDF printed 4 pages to 
       /tmp/SAS_workAE9500004D83_localhost.localdomain/SAS_work5F1F00004D83_localhost.localdomain/mypdf.pdf.
 106        
 107        OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 117        
DanH_sas
SAS Super FREQ

Hey @Quentin , thanks for bringing this to our attention. For your use case, I believe you will find it faster to use the DISPLAYSTATS option on the VBOX statement. Give that a try and see if that works well for you.

 

Quentin
PROC Star

Thanks @DanH_sas  that looks lovely.  Best argument yet that I should upgrade from 9.4M4. : )

 

 

 

Check out the Boston Area SAS Users Group (BASUG) video archives: https://www.basug.org/videos.

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 601 views
  • 3 likes
  • 4 in conversation