A good programming practice is to estimate how long your simulation will run. See Chap 15 (especially Section 15.3) of my book to learn how to run the simulation on a smaller problem and predict how long it will take on the full problem.
Most of your CPU stats are not relevant to this problem. The problem is not memory intensive, so 64-bit processing and 64GB of RAM are not relevant. None of your computations are multithreaded, so the 6 cores and 12 threads are not relevant. The only thing that matters in this problem is how fast you can iterate, because the innermost loop does not perform very much work.
You have four nested loops. All of them are constant length except for the innermost, which has a DO UNTIL clause. The maximum number of iterations that you are asking for is about 7*330*10000*15000 = 350 BILLION iterations!
This is a lot. To give you some idea, consider the following DATA step program, which does nothing more than accumulate 350 billion random numbers:
options fullstimer;
data A;
call streaminit(1);
keep count;
count = 0;
do l= 1 to 7;
do h= 7 to 40 by 0.1;
do k = 1 to 10000;
do m= 1 to 15000;
count = count + rand("Normal");
end;
end;
end;
end;
output;
run;
This program is much simpler than your program. It does not write any observations (writing data is a slow operation because it accesses the disk) and it does not do any numerical linear algebra. It doesn't have any IF/THEN logic. How long do you think the DATA step requires to run? One minute? 5 minutes? 15 minutes?
How about 6.5 hours (estimated)!
And that estimate is for the DATA step, which in general can iterate faster than PROC IML because it is a simpler language with simpler parsing rules.
Since you've said that you don't want to wait for "days" for the program to complete, I suggest you compute on a coarser grid (the H loop) and use fewer simulations (the K and M loops). I also suggest that you run a series of small-scale simulations so that you can predict how long the large-scale simulation will take. Lastly, you should consider printing L and H inside the second loop so that you can always tell how far the program has progressed.