Hello folks,
I am still having trouble with the attached IML simulation code. I have made all
recommended changes (Thanks to Ian and Rick) and the code is still running very
slow. It is running on SAS 9.3 64 bit on an Intel i7 3930 3.2, 6 core, 12
threads, with 64GB RAM. So, I have the all the power available to any non-corprate user.
I am not sure why SAS is running extremely slow. This code should not run for days on
end and it would be OK for one, but I need around 96 different permutations to
finish my project.
I am extremely desperate now and I would appreciate e any help if anyone could look
into this code and see if there is something missing.
Please help if you have any ideas.
Thanks you all.
A good programming practice is to estimate how long your simulation will run. See Chap 15 (especially Section 15.3) of my book to learn how to run the simulation on a smaller problem and predict how long it will take on the full problem.
Most of your CPU stats are not relevant to this problem. The problem is not memory intensive, so 64-bit processing and 64GB of RAM are not relevant. None of your computations are multithreaded, so the 6 cores and 12 threads are not relevant. The only thing that matters in this problem is how fast you can iterate, because the innermost loop does not perform very much work.
You have four nested loops. All of them are constant length except for the innermost, which has a DO UNTIL clause. The maximum number of iterations that you are asking for is about 7*330*10000*15000 = 350 BILLION iterations!
This is a lot. To give you some idea, consider the following DATA step program, which does nothing more than accumulate 350 billion random numbers:
options fullstimer;
data A;
call streaminit(1);
keep count;
count = 0;
do l= 1 to 7;
do h= 7 to 40 by 0.1;
do k = 1 to 10000;
do m= 1 to 15000;
count = count + rand("Normal");
end;
end;
end;
end;
output;
run;
This program is much simpler than your program. It does not write any observations (writing data is a slow operation because it accesses the disk) and it does not do any numerical linear algebra. It doesn't have any IF/THEN logic. How long do you think the DATA step requires to run? One minute? 5 minutes? 15 minutes?
How about 6.5 hours (estimated)!
And that estimate is for the DATA step, which in general can iterate faster than PROC IML because it is a simpler language with simpler parsing rules.
Since you've said that you don't want to wait for "days" for the program to complete, I suggest you compute on a coarser grid (the H loop) and use fewer simulations (the K and M loops). I also suggest that you run a series of small-scale simulations so that you can predict how long the large-scale simulation will take. Lastly, you should consider printing L and H inside the second loop so that you can always tell how far the program has progressed.
Thank you Rick. Great advise. I will incorporate all in my code going forward. Some prospective on this code.
The m loop (15000) is a maximum or an upper bound. The loop could stop after 1 or up to 15000 iteration. That is why it is harder to estimate the total run time. Unfortunately this is the hard part of the code that I have very little control over, the sweet spot between 1 & 15,000.
I've estimated the code to run about 96 hours based on initial smaller runs. I am adding some spatial signed-rank modules in step 2 and that where the code slows down. I considered doing this in R, but the IML modules written in SAS are not easily recoded in R.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.