Need advice on improving efficiency

Reply
N/A
Posts: 1

Need advice on improving efficiency

Hi all,   I am trying to make the following code run faster and I really need some help:

sasfile data1 open;

sasfile data2 open;

proc sql;

   create view rolling as

      select  

               rolls.id as rollid1,

               rolls.date as rollid2

             , windows.x

             , windows.y

       from data1 as rolls,  data2 as windows

       where rollid1=windows.id

       and 0<=intck('month', rollid2, windows.date)<12

       group by rollid1,rollid2

       order by rollid1, rollid2;

quit;

proc reg noprint outest=loadings;

by rollid1 rollid2;

model x=y;

run;

The run time info. is as follows:

NOTE: PROCEDURE SQL used (Total process time):

      real time           36:39.76

      cpu time            36:45.43

The setup of my computer is as follows:

  CPU:  i7-2600k  4-core hyper-threaded

  Ram:  32G

  HDDs:  1. Raid0 with two 7200rpm regular hds,  2,3  SSD

As you can see, I preloaded the data sets into RAM first to speed-up IO.

When the code was running, I checked CPU performance and only one

thread seemed to be fully loaded, while other three were lightly loaded and the

rest four were idling.  This is consistent with the fact that the real time is

almost equal to the CPU time. Also, I am sure that the THREADS option is set.

It seemed to me that IO is the bottleneck here, I am wondering whether there is

a way to improve the speed. I would appreciate any help.

Respected Advisor
Posts: 4,663

Re: Need advice on improving efficiency

Two things to try:

- have indexes on id and date in both data1 and data2
- remove the GROUP BY clause in rolling view

PG

PG
PROC Star
Posts: 1,101

Re: Need advice on improving efficiency

I think that what you're referring to as "threads" are actually CPUs. I'm not sure that PROC SQL can take advantage of more than 1 CPU. I took a quick look, but couldn't find an answer.

However, you might benefit from adjusting CPUCOUNT and THREAD. Here's a SAS documentation stream that talks about it:

http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a002283824.htm

Tom

Super User
Posts: 5,260

Re: Need advice on improving efficiency

I don't believe SAS will make any difference between CPUs and cores. And SAS is considering hyper threading as you have twice as many cores (at least when it comes to licensing...).

And SQL will take advantage of multiple CPUs/cores, at least when it comes to sorting and grouping.

Now we don't have the size of the two tables, so we can't really know which join method Proc SQL chooses, but if a sort/merge is chosen, multi threading will probably occur. By looking of the log, you'll see that real time is less than CPU time. But looking at the numbers, it seems that there is possibility for improvement.

Indexing on the id columns could trigger an indexed join, but I doubt that indexing on date would do any good. That's because SQL has problems optimizing join logic using between and, especially when you have a function call - then it won't happen.

Adding the _method proc sql option, and the global msglevel=i option will help you see how SQL handles the query (for _method, I think you have to temporarily  have to do create table instead).

Data never sleeps
Valued Guide
Posts: 2,175

Re: Need advice on improving efficiency

Since CPU and REAL times are almost the same, I/O doesn't seem to be your constraint.

So it must be CPU .....

applying functions on the join always impacts CPU performance.

You might see improvement if you adapt WHERE to a JOIN like

       from data1 as rolls

       join  data2 as windows

      where rolls.id = windows.id

       and    rolls.date >= windows.date

       and    rolls.date < (windows.date +366)

the main thing is removing the INTCK() on every intersection of the cartesian join

If you really want to use INTCK, and need this level of detail in your result set, apply the grouping before the join.

Being separate data streams you can run those activities in parallel. .

You can then benefit from PROC SUMMARY and MEANS which advantage of the multi-CPU for each of those separate streams.

If the result set from the JOIN is large, rather than suffer the I/O/I writing it into a WORK. dataset with SQL, then reading it back in again in PROC REG, create a VIEW in SQL for the REG procedure to read. (of course, this approach might fall down if REG makes multiple passes through the data, but I think it is worth trying).

It would be interesting to see the NOTEs for the PROC REQ especially the times and memory usage

Ask a Question
Discussion stats
  • 4 replies
  • 227 views
  • 0 likes
  • 5 in conversation