BookmarkSubscribeRSS Feed
H
Pyrite | Level 9 H
Pyrite | Level 9

I am conducting a simulation study. I have not even gotten close to the number of trials that would be informative (at 100 trials would like to use say 10,000). I am simulating about 100 datasets and running procedures on them and creating lots secondary datasets. By the time I get to my final data set, which is an aggregate data set with say 30 variables and 100 obs, SAS is ready to give up. Just running a basic proc univariate or clicking on the Work directory takes a long time and SAS will temporarily stop responding.

 

I am working on cleanning up my code, but I am curious what I can do to get thngs running efficiently again after the creating the initial datasets using simulated data?  Can I delete out all of the created work datasets that are no longer needed? If so, what is the best approach, most have a naming rule so say, dataset1-dataset100, etc. I could save my final set and close and reopen SAS, but that seems like a cheat and I would rule out that my computer in general seems taxed after running the simulattions.

 

I am running a single license on a normal Windows based laptop, I also jump onto a University virtual desktop portal this morning to see if that would work, but it is also moving very slow after the  simulations. 

 

Any help or comments would be appreciated!

6 REPLIES 6
ballardw
Super User

If you are generating lots of data sets you make be having issues with the Operating sytem file system as well as SAS. Large numbers of sets may cause the system more time to read/write into the same directory (folder). SAS maintains several data views (SASHelp.vtable Sashelp.vcolumn etc, or dictionary tables if accessed through Proc Sql) that contain details of all the data sets, varaibles, indexs, and such in use.

 

When you "click on the work directory" it reads the information from the those views as needed. More datasets=> more data to read=> more time.

 

Slow results from Proc Univariate I would only expect from very large numbers of variables and much of the time may be in formating the html output. You might post example code for Proc univariate (with actual variables instead of macro containing lists if that is something you are doing).

 

One easy place to look is either re-use the names of the truly temporary data sets to reduce proliferation or to use proc datasets to delete them as soon as they are no longer needed. You may want to create a library just for the temporary sets so that it is easy to delete all of the members with Proc datasets.

H
Pyrite | Level 9 H
Pyrite | Level 9

Well I added the following code to delete all temp files.

 

proc datasets library=work;
     save All_Metrics_Long All_Metrics_Wide;
run;

Which didn't really do much in regards to efficiency, though it alluded me to the  3500+ files I had.

 

 

Next I deleted my Results, which made a huge difference. The log wasn't really recording how long the procedures were taking, both the log results before and after deleting the results was the same, so processing was the same, but the real-time on my end was drastically different. Hmm

 

Is there any piece of code I can run to clear my results and log. I kind of remember a global statement from a long time ago that I saw that did it, back when computers were slower.  But it doesn't have to be a global state per se.

ballardw
Super User

Do you actually use the results? if not you can close all of the results before they are created: ODS _all_ close; BEFORE creating all of that output. Don't forget to reset a destination such as ODS HTML; when you want it started again.

 

If you need selected bits then explicitly write then to a different file with your choice of ODS statemet.

 

If using base sas this will clear the results as a program statement.

dm odsresults 'select all; clear;' ;

You may have been thinking of dm output 'clear'; to clear the traditional output listing window.

 

mkeintz
PROC Star

Do you really needs 100's of data set FILES?  In many cases data set VIEWS would work, and require a lot less disk input/ouput.  What is the structure of your simulation data sets, and what do you do with them after they are created?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Peter_C
Rhodochrosite | Level 12
Is this performance problem on the latest release of SAS?
Rick_SAS
SAS Super FREQ

I suggest you look at some of the best practices for efficient simulation in SAS:

1. Simulation in SAS: The slow way or the BY way

2. Turn off ODS when running simulations in SAS

These and other techniques are available in the book Simulating Data with SAS

 

If you say more about what you are doing---or better yet post some sample code---there's a good chance you will get some time-saving tips.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1455 views
  • 2 likes
  • 5 in conversation