I have a simulation code (ATTACHED) that is functional and works well, but is very slow. We are talking 4-5 days of continuous run time.
I think there is an input/output bottleneck in appending matrices (vertically). Is there something that could be done to reduce time, memory, permanent instead of work of temporary That might make this code run faster.
I am saving qualifying records as vectors and appending them to a matrix called RESULT. At the end of loop grid, I am appending the matrix to a permanent SAS data set for further analysis.
I have done all I can with my limited knowledge of IML and I am desperate for help.
Many thanks,
Jamil
I haven't run your program, but I believe that you are dealing with a 2x2 matrix, so the the time taken for the matrix inversion is not going to be significant no matter how you do it. Have you addressed Rick's comment in bold? I think it is in bold for a good reason. For every iteration you make, you are effectively declaring a new data structure Result, copying all of the old results to it as well as the most recent result. So as the iteration count rises, larger and larger amounts of data are shifted around killing the performance. Perhaps try to structure the program so that you declare a matrix at the start to hold the results for every iteration of your inner most loop, and then add each result to the matrix with a statement like Result[m,]=R; After the innermost loop has finished then you could dump the contents of Result to a SAS data set.
1) move all allocations outside the do loops. you only need to allocate X at the top of the program, not inside four nested DO loops
2) You don't need to allocate zm, T, or any other arrays that you are going to assign. If you execute "zm = equation", do not allocate zm first, since zm just gets overwritten
3) You don't need x1 and x2. Just send X into the RAND function to fill up both columns.
3) Get rid of the lines
COV_INV_Z=inv (COV_Z);
T= Z * COV_INV_Z *t(Z);
and use the SOLVE function instead as shown in
http://blogs.sas.com/content/iml/2011/08/10/do-you-really-need-to-compute-that-matrix-inverse/
and
http://blogs.sas.com/content/iml/2011/08/17/solving-linear-systems-which-technique-is-fastest/
4) The line that is REALLY killing your performance is
RESULT=RESULT // R;
Read Section 2.9 on p. 34-35 of my book Statistical Programming with SAS/IML Software. That chapter is available as a free download .
Start with (4); the performance will improve incredibly.
Thank you Rick. I will work on these changes and see if perfrmance improves.
Rick, by the way, I hav your book, which is excellent, I am on teh other hand is a novice in IML.
How would the SOLVE function work in this case. I have an original X (n x p) matrix and a Z matrix (1 x p)?
Is the following remotely correct instead of the above. I need to calculate T (scalar) from a quadratic form T= Z * COV_INV_Z *t(Z)
T= SOLVE (X,Z); ???
I haven't run your program, but I believe that you are dealing with a 2x2 matrix, so the the time taken for the matrix inversion is not going to be significant no matter how you do it. Have you addressed Rick's comment in bold? I think it is in bold for a good reason. For every iteration you make, you are effectively declaring a new data structure Result, copying all of the old results to it as well as the most recent result. So as the iteration count rises, larger and larger amounts of data are shifted around killing the performance. Perhaps try to structure the program so that you declare a matrix at the start to hold the results for every iteration of your inner most loop, and then add each result to the matrix with a statement like Result[m,]=R; After the innermost loop has finished then you could dump the contents of Result to a SAS data set.
Good one! I didn't even notice these were 2x2 matrices! (But it's good to learn how to use SOLVE anyway!)
Ian,
Does it matter where I declare the matrix in this case? Should I declare it outside of the first loop or before the inner most loop.
Thanks,
Jamil
Jamil,
Outside the first loop declare something like RESULT=j(15000,4); Then move your append just after the inner most loop. Of course you could make RESULT big enough to contain all the results from all the loops, but I think then you might run in to memory problems.
Ian.
Ian,
I will try it. I think each iteration will have 10000 rows. The inner most loop generates one and only one row once T > h.
I have a lot of memory on this machine, 64GB, but you never really know with SAS.
Thank you for your suggestions and help.
Jamil
I made the changes and it seems to be running faster. Although I am not sure if I have the optimal append location.
I am appending every 10,000 rows for each lambda and K combination.
I will post another update and my find code soon.
Thanks again.
try T=z*solve(COV_Z, z`);
Thank you folks. I will make these changes and see if overll perfomance gets better.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.