First, a big thanks to the SAS community that recently helped get a prog back up and running. I have run into two issues related to run time. These both involve the now-running program: a simple regression analysis that is looped for x amount of iterations.
First, i am getting different run times using the same data and program. For example, run #1 takes 1 hour 44 minutes while Run #2 takes 1 hour 21 minutes. No changes to data or program and no other applications were open. Simply ran it and then hit run again. One expects repeatable times: bizarre.
Second, and the more important issue, I was getting stable, actually slightly decreasing, run times in terms of time/iteration loop, until the iterations got to be large, then, the time/iteration doubles:
Total Seconds | 4 | 58 | 627 | 4,866 | 83,734 |
seconds/iteration | 0.114285714 | 0.097478992 | 0.095798319 | 0.092933537 | 0.257935139 |
iterations | 35 | 595 | 6,545 | 52,360 | 324,632 |
I found this note (http://support.sas.com/kb/57/630.html) and changed the dataset names to be different, which significantly improved the time per iteration, across the board, but that pesky doubling at 324K iterations remains.
Total Seconds | 4 | 47 | 513 | 4,521 | 49,066 |
seconds/iteration | 0.114285714 | 0.078991597 | 0.078380443 | 0.086344538 | 0.151143449 |
iterations | 35 | 595 | 6,545 | 52,360 | 324,632 |
I think the ballooning lag relates to the .lck and .dat writing as the prog runs. The temporary .lck and intermediate data files are in the 160 MB range near the end of the iterations.
I also thought it could be printing some notes or results, but I thought I turned off the log and results so it doesn't waste time or clog. I would appreciate an explanation of why run time/iteration is ballooning or suggestions to make the code more efficient.
Looks like you are running this program interactively. Since it takes over an hour to run you would be better off running this in batch mode. This will remove any timing variations caused by your SAS client interface.
Also you don't say if you are running this on a local PC or on a remote SAS Server. Even on a PC you will get timing variations caused by whatever else you are using it for at the time and if your SAS program is reading remote data across a network this will cause more variations.
I don't know that this would explain the unexpected run time per iteration, but my question is why are you making single-use data set FILES instead of data set VIEWS. It appears that TARGETONLY, TARGET2ONLY, TARGET3ONLY, PMSADJ, TARGETMEANS2, are each created only to append some data to &PMS_data. So why bother to write these data to disk only to re-read and forget? It's quite possible that a lot of the time you are using is just the writing out to disk of intermediate data sets. Using them as views instead will use more memory but potentially could save a lot of clock time.
On a more strategic note, would it be possible to avoid iterating, and just create bigger datasets with BY groups corresponding to your macro iterations?
Mkeintz:
I tested the VIEWs and no change, in fact it made it slightly slower when doing the larger set of iterations. I watched the temporary sas folder and, those VIEW tables are written to the drive just as the .lck and .sas files are. So, its not surprising that the run times did not change.
However, all of those files only have a single observation. So, I don't think those files are the problem. I suspect that the main issue is the &PMS_ data file that is appended with each iteration.
proc append FORCE base=&PMS_ data=TARGETMEANS2; run;
The &PMS_ starts small, but gets to be in the 150MB range when I am at the 300,000 iteration level. That's just my guess. If so, is there a way to not write out that datafile with each iteration?
On the BY group suggestion, I am desparate to seek a faster solution, but I am not sure what you mean by bigger datasets. The iterations come from 'changelist' value in the testcomb2 dataset. The program takes the changelist values (a long string), uses that string to select a subset of observations in The REGDATA1 and runs the regression and aggregates the results in the &PMS_ file (after making the regression results jump through a few hoops). Then, it repeats for the next changelist value in the combos2 dataset. How would I structure that series of steps as you suggested? Thanks
Looks like you are running this program interactively. Since it takes over an hour to run you would be better off running this in batch mode. This will remove any timing variations caused by your SAS client interface.
Also you don't say if you are running this on a local PC or on a remote SAS Server. Even on a PC you will get timing variations caused by whatever else you are using it for at the time and if your SAS program is reading remote data across a network this will cause more variations.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.