BookmarkSubscribeRSS Feed
Gwangik
Calcite | Level 5

Hi, 

 

I am running simple data step with WHERE statement. 

Since the source data is enormous(10million rows), it takes time forever.

However, I realized that my computer use only 30% of RAM (out of 16GB), and 7% of CPU.

SASV9.cfg has been modifed but still use the same capacity. 

 

Is there anyway to increase the computer resource use for SAS so that make the process faster?

 

Thank you. 

 

 

14 REPLIES 14
SASKiwi
PROC Star

Since your processing is neither memory or CPU constrained, your conclusion should be that it is IO constrained - your hard drive is maxing out. If you are running this on a PC then installing an SSD (Solid State Drive) and putting your SAS data on that as well as your SAS WORK folder should speed things up.

LinusH
Tourmaline | Level 20

10 million roes doesn't sound that large, so if it's simple data step processing, I think that your problems are HW related.

If you access a fairly small subset, consider creating an index on your data set.

You could try increase the MEMSIZE SAS system option, but that will probably not make any difference for sequential processing such as data steps.

To look into more detail of your resource consuption, use

options msglevel=i fullstimer;
Data never sleeps
ballardw
Super User

What does your where clause look like? Separate where statement or data set option?

How many variables are in the data set? Are they all needed? Dropping variables may reduce disk IO somewhat.

ChrisNZ
Tourmaline | Level 20

There many ways to speed up I/Os.

 

Storage format (SPDE engine vs V9 engine, compression type) has a huge influence.

An SPDE library with binary compression could speed things up dramatically.

 

Other issues since you have a where clause are the table sort order and the indexes.

 

Also, maybe a IF clause would be faster than a WHERE clause. See here:

https://communities.sas.com/t5/Base-SAS-Programming/Using-MERGE-with-LIKE-command/m-p/301654#M63878

 

Another critical factor is the way you retrieve your data in terms of memory utilisation.

The data set's buffer size, the number of buffers used, whether or not Windows's Direct-IO is used all have a huge effect.

To illustrate this point, here is a benchmark example taken from this book about making SAS faster:

 

https://www.amazon.com/High-Performance-SAS-Coding-Christian-Graffeuille/dp/1512397490

 

 

Table 5.5: Run times for various values of BUFSIZE, BUFNO and SGIO for a 10,000 MB data set

10,000,000 kB

 

BUFNO

1

5

25

100

500

BUFSIZE

SGIO

195.45

.

.

.

.

0

no

4k

no

235.25

236.25

235.56

239.78

235.39

yes

1177.55

572.32

157.28

93.60

93.96

8k

no

163.77

162.05

164.22

164.05

161.34

yes

542.85

268.86

95.45

81.07

80.84

16k

no

138.03

140.26

136.16

137.49

136.11

yes

265.46

142.55

75.28

74.61

81.70

32k

no

143.70

148.85

137.96

147.67

143.98

yes

140.56

93.84

73.35

73.33

75.48

64k

no

180.47

171.39

166.63

151.69

173.16

yes

88.62

70.14

71.62

72.00

.

128k

no

235.25

236.25

235.56

239.78

235.39

yes

70.72

71.30

71.34

70.33

.

 

 The default run time is 195 seconds, which can be reduced to 70s by simply changing the memory usage settings.

Ksharp
Super User
It is best for multi-thread PROC DS2.



data test; 
do x=1 to 12316;
 j=x*x; 
output; 
end; 
run; 
 
proc ds2; 
  thread newton/overwrite=yes; 
    dcl double y count; 
    dcl bigint thisThread;
    drop count; 
    method run(); 
      set test; 
      /*  if  x=1 ; */
      thisThread=_threadid_; 
      count+1;
    end; 
    method term();
      put '**Thread' _threadid_ 'processed'  count 'rows:';
    end;
  endthread; 
  run; 
quit; 
 
proc ds2; 
  data want; 
    dcl thread newton frac; 
    method run(); 
         set from frac threads=3; 
    end; 
  enddata;
  run; 
quit;
ChrisNZ
Tourmaline | Level 20

@Ksharp I am pretty sure this would slow down the process considering the data is not spread onto different nodes.

I can't benchmark atm, but if you can...

Ksharp
Super User

Chris,
No. I can't. This is just intuitive to me . Multiple sessions is best for WHERE / IF situation.

ChrisNZ
Tourmaline | Level 20

OK, I had time for a quick run.

 

DS2 improves things slightly because my storage is fast enough (or my CPU cores slow enough) to be CPU-bound for single-table processing.

This doesn't change whether there is a IF/WHERE subset or not, the pattern is the same.

The 10% elapse time reduction happens at a serious CPU cost (in this case, CPU usage doubles).

 

I still suspect that throwing more CPU at a process that is not CPU-bound would not help, and that DS2 threads would seriously degrade performance on slower storage as the I/Os would be more spread.

 

My numbers:

 


options nofullstimer compress=no;

data HAVE; 
  length A $200;
  do I=1 to 1e8;
    output; 
  end; 
run; 
 
proc ds2; 
  thread NEWTON/overwrite=yes; 
    method run(); 
      set HAVE;
    end; 
  endthread; 
  run; 
quit; 
 
proc ds2; 
  data WANT; 
    dcl thread NEWTON FRAC; 
    method run(); 
      set from FRAC threads=3; 
    end; 
  enddata;
  run; 
quit;
%*      real time           27.38 seconds ;
%*      cpu time            51.44 seconds ;

data WANT2; 
  set HAVE;
run;
%*      real time           29.31 seconds ;
%*      cpu time            29.31 seconds ;
ChrisNZ
Tourmaline | Level 20

Different (slower, LIN64 instead of WIN64) server, (very) different results, but still no reason to use DS2 for this process.

 

NOTE: PROCEDURE DS2 used (Total process time):
      real time           1:03.27
      cpu time            1:05.44
    
NOTE: DATA statement used (Total process time):
      real time           1:05.45
      cpu time            34.15 seconds

 

Ksharp
Super User
Chris,
You are right. Sad for DS2 .


Gwangik
Calcite | Level 5

Hi All, 

 

First of all, thank you so much for your tips and advises. 

It was my first time to post my question here. 

 

I am a die-hard DATA step user. 

It took me an hour to process very simple Data Step with 1 WHERE statement w several ANDs.

I could not figure it out why the memory usage is so low. 

NOTE: DATA statement used (Total process time):
real time 1:01:47.84
user cpu time 27:13.95
system cpu time 1:49.85
memory 1540.78k
OS Memory 31988.00k

 

 

This is the time record when I used Proc SQL. 

This takes only about 6min which is way quicker than 1 hour with high Memory usage.

NOTE: PROCEDURE SQL used (Total process time):
real time 6:19.32
user cpu time 1:37.65
system cpu time 7.61 seconds
memory 637487.35k
OS Memory 668740.00k

 

I am not sure what makes this huge real time difference.....

 

Again, I appreciate for all the replies.

Thank you.  

Ksharp
Super User

PROC SQL is multi-session oriented PROC, maybe that is reason.
Also you can use OPTIONS MSGLEVEL=I ; to check if SQL used index too.

ChrisNZ
Tourmaline | Level 20

SQL has a slightly different profile than the data step (on the faster windows server).

10% longer elapse and 10% less cpu used.

 

NOTE: PROCEDURE SQL used (Total process time):
      real time           31.58 seconds
      cpu time            25.55 seconds

 

It would be interesting to know more.

Index usage is indeed the first thing to come to mind for such a dramatic improvement, though there is no reason only SQL should use the index.

 

LinusH
Tourmaline | Level 20
If we were to make anything out of that you need to attach the code...And Perhaps a proc contents.
Again, data steps usually process one observation at a time and doesn't utilize memory much.
SQL on the other hand doesn't need to preserve observations order so I guess it can procees many observations at the same time.
Data never sleeps

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 13733 views
  • 1 like
  • 6 in conversation