Statistical programming, matrix languages, and more

Input output bottleneck!!

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 11
Accepted Solution

Input output bottleneck!!

I have a simulation code (ATTACHED) that is functional and works well, but is very slow. We are talking 4-5 days of continuous run time.

I think there is an input/output bottleneck in appending matrices (vertically). Is there something that could be done to reduce time, memory, permanent instead of work of temporary That might make this code run faster.

I am saving qualifying records as vectors and appending them to a matrix called RESULT. At the end of loop grid, I am appending the matrix to a permanent SAS data set for further analysis.

I have done all I can with my limited knowledge of IML and I am desperate for help.

Many thanks,

Jamil

Attachment

Accepted Solutions
Solution
‎05-24-2012 06:06 AM
Frequent Contributor
Posts: 122

Re: Input output bottleneck!!

I haven't run your program, but I believe that you are dealing with a 2x2 matrix, so the the time taken for the matrix inversion is not going to be significant no matter how you do it.  Have you addressed Rick's comment in bold?  I think it is in bold for a good reason.   For every iteration you make, you are effectively declaring a new data structure Result, copying all of the old results to it as well as the most recent result.  So as the iteration count rises, larger and larger amounts of data are shifted around killing the performance.  Perhaps try to structure the program so that you declare a matrix at the start to hold the results for every iteration of your inner most loop, and then add each result to the matrix with a statement like Result[m,]=R;   After the innermost loop has finished then you could dump the contents of Result to a SAS data set.

View solution in original post


All Replies
SAS Super FREQ
Posts: 3,222

Re: Input output bottleneck!!

1) move all allocations outside the do loops. you only need to allocate X at the top of the program, not inside four nested DO loops

2) You don't need to allocate zm, T, or any other arrays that you are going to assign. If you execute "zm = equation", do not allocate zm first, since zm just gets overwritten

3) You don't need x1 and x2. Just send X into the RAND function to fill up both columns.

3) Get rid of the lines

  COV_INV_Z=inv (COV_Z);

  T=  Z * COV_INV_Z *t(Z);

and use the SOLVE function instead as shown in

http://blogs.sas.com/content/iml/2011/08/10/do-you-really-need-to-compute-that-matrix-inverse/

and

http://blogs.sas.com/content/iml/2011/08/17/solving-linear-systems-which-technique-is-fastest/

4) The line that is REALLY killing your performance is

RESULT=RESULT // R;

Read Section 2.9 on p. 34-35 of my book Statistical Programming with SAS/IML Software. That chapter is available as a free download .

Start with (4); the performance will improve incredibly.

Occasional Contributor
Posts: 11

Re: Input output bottleneck!!

Thank you Rick. I will work on these changes and see if perfrmance improves.

Occasional Contributor
Posts: 11

Re: Input output bottleneck!!

Rick, by the way, I hav your book, which is excellent, I am on teh other hand is a novice in IML.

How would the SOLVE function work in this case. I have an original X (n x p) matrix and a Z matrix (1 x p)?

Is the following remotely correct instead of the above. I need to calculate T (scalar) from  a quadratic form T=  Z * COV_INV_Z *t(Z)

T= SOLVE (X,Z); ???

Solution
‎05-24-2012 06:06 AM
Frequent Contributor
Posts: 122

Re: Input output bottleneck!!

I haven't run your program, but I believe that you are dealing with a 2x2 matrix, so the the time taken for the matrix inversion is not going to be significant no matter how you do it.  Have you addressed Rick's comment in bold?  I think it is in bold for a good reason.   For every iteration you make, you are effectively declaring a new data structure Result, copying all of the old results to it as well as the most recent result.  So as the iteration count rises, larger and larger amounts of data are shifted around killing the performance.  Perhaps try to structure the program so that you declare a matrix at the start to hold the results for every iteration of your inner most loop, and then add each result to the matrix with a statement like Result[m,]=R;   After the innermost loop has finished then you could dump the contents of Result to a SAS data set.

SAS Super FREQ
Posts: 3,222

Re: Input output bottleneck!!

Good one! I didn't even notice these were 2x2 matrices!  (But it's good to learn how to use SOLVE anyway!)

Occasional Contributor
Posts: 11

Re: Input output bottleneck!!

Ian,

Does it matter where I declare the matrix in this case? Should I declare it outside of the first loop or before the inner most loop.

Thanks,

Jamil

Frequent Contributor
Posts: 122

Re: Input output bottleneck!!

Jamil,

Outside the first loop declare something like RESULT=j(15000,4);  Then move your append just after the inner most loop.  Of course you could make RESULT big enough to contain all the results from all the loops, but I think then you might run in to memory problems.

Ian.

Occasional Contributor
Posts: 11

Re: Input output bottleneck!!

Ian,

I will try it. I think each iteration will have 10000 rows. The inner most loop generates one and only one row once T > h.

I have a lot of memory on this machine, 64GB, but you never really know with SAS.

Thank you for your suggestions and help.

Jamil

Occasional Contributor
Posts: 11

Re: Input output bottleneck!!

I made the changes and it seems to be running faster. Although I am not sure if I have the optimal append location.

I am appending every 10,000 rows for each lambda and K combination.

I will post another update and my find code soon.

Thanks again.

SAS Super FREQ
Posts: 3,222

Re: Input output bottleneck!!

try T=z*solve(COV_Z, z`);

Occasional Contributor
Posts: 11

Re: Input output bottleneck!!

Thank you folks. I will make these changes and see if overll perfomance gets better.

Post a Question
Discussion Stats
  • 11 replies
  • 581 views
  • 10 likes
  • 3 in conversation