05-19-2017 11:04 AM - edited 05-19-2017 11:05 AM
I am conducting a simulation study. I have not even gotten close to the number of trials that would be informative (at 100 trials would like to use say 10,000). I am simulating about 100 datasets and running procedures on them and creating lots secondary datasets. By the time I get to my final data set, which is an aggregate data set with say 30 variables and 100 obs, SAS is ready to give up. Just running a basic proc univariate or clicking on the Work directory takes a long time and SAS will temporarily stop responding.
I am working on cleanning up my code, but I am curious what I can do to get thngs running efficiently again after the creating the initial datasets using simulated data? Can I delete out all of the created work datasets that are no longer needed? If so, what is the best approach, most have a naming rule so say, dataset1-dataset100, etc. I could save my final set and close and reopen SAS, but that seems like a cheat and I would rule out that my computer in general seems taxed after running the simulattions.
I am running a single license on a normal Windows based laptop, I also jump onto a University virtual desktop portal this morning to see if that would work, but it is also moving very slow after the simulations.
Any help or comments would be appreciated!
05-19-2017 11:47 AM
If you are generating lots of data sets you make be having issues with the Operating sytem file system as well as SAS. Large numbers of sets may cause the system more time to read/write into the same directory (folder). SAS maintains several data views (SASHelp.vtable Sashelp.vcolumn etc, or dictionary tables if accessed through Proc Sql) that contain details of all the data sets, varaibles, indexs, and such in use.
When you "click on the work directory" it reads the information from the those views as needed. More datasets=> more data to read=> more time.
Slow results from Proc Univariate I would only expect from very large numbers of variables and much of the time may be in formating the html output. You might post example code for Proc univariate (with actual variables instead of macro containing lists if that is something you are doing).
One easy place to look is either re-use the names of the truly temporary data sets to reduce proliferation or to use proc datasets to delete them as soon as they are no longer needed. You may want to create a library just for the temporary sets so that it is easy to delete all of the members with Proc datasets.
05-19-2017 02:58 PM - edited 05-19-2017 02:59 PM
Well I added the following code to delete all temp files.
proc datasets library=work; save All_Metrics_Long All_Metrics_Wide; run;
Which didn't really do much in regards to efficiency, though it alluded me to the 3500+ files I had.
Next I deleted my Results, which made a huge difference. The log wasn't really recording how long the procedures were taking, both the log results before and after deleting the results was the same, so processing was the same, but the real-time on my end was drastically different. Hmm
Is there any piece of code I can run to clear my results and log. I kind of remember a global statement from a long time ago that I saw that did it, back when computers were slower. But it doesn't have to be a global state per se.
05-19-2017 04:44 PM
Do you actually use the results? if not you can close all of the results before they are created: ODS _all_ close; BEFORE creating all of that output. Don't forget to reset a destination such as ODS HTML; when you want it started again.
If you need selected bits then explicitly write then to a different file with your choice of ODS statemet.
If using base sas this will clear the results as a program statement.
dm odsresults 'select all; clear;' ;
You may have been thinking of dm output 'clear'; to clear the traditional output listing window.
05-19-2017 06:14 PM
Do you really needs 100's of data set FILES? In many cases data set VIEWS would work, and require a lot less disk input/ouput. What is the structure of your simulation data sets, and what do you do with them after they are created?
05-22-2017 01:28 PM
I suggest you look at some of the best practices for efficient simulation in SAS:
These and other techniques are available in the book Simulating Data with SAS
If you say more about what you are doing---or better yet post some sample code---there's a good chance you will get some time-saving tips.