05-05-2014 05:23 AM
In general, are there any dangers with expressions as below?
I prefer to make various manipulations with a data set in several steps, but I want to avoid creating unnecessary data sets in the work library..That is the background to my question.
05-05-2014 05:52 AM
No.not any danger, just override original dataset. When you do it, actually sas will create a temporary dataset at the same fold and at the end sas will rename it to your original dataset .
05-05-2014 07:18 AM
In addition to Ksharp:
If SAS encounters an error (or already has an error condition set if you are in batch mode), the data set will not be replaced.
If you test-run such code, you will be forced to rerun it from the start each time you need to control an intermediate stage of the data set. That's why I let all my data sets in a batch job have unique names.
05-05-2014 11:30 AM
Be very careful about recoding existing variables back into the same variable.
I inherited some code and dataset where code similar to this was used in this situation:
if var=2 then var=1;
else if var=1 then var=0;
Apparently the code was run on the same dataset a couple times as the variable ended up with all 0 values where not missing.
If you recode into new variables then there isn't a problem. But reuse dataset and reuse variables can be dangerous in terms of data content.
05-05-2014 11:54 AM
Why avoiding the SAS work? It is meant to be used for temporary data.
Do you want to have it cleaned up at your regular moments. Do a clean-up with "proc datasets"
There are possible a lot of datasets there of the type #utl- being created by eg Sort SQL and possible more.
A well designed SAS installation can deal with this. The saswork should get attention to be very well responsive.
You are changing your issue you are possible having at saswork to an issue with the permanent storage.
Just storing and not replacing you must count 1 time of the needed size, replacing is at least asking for 2 times of the needed size.
- SQL does not support this type of creating a table you are using also as input.
With the introduction of multithreading (SAS 9) this has become logical impossible.
- Checkpoint restart is more advanced approach to be able to do restarts. SAS(R) 9.3 System Options: Reference, Second Edition (STEPCHKPTLIB= System Option)
This is an more advanced automatisation way of restarting processes more common known to big systems
05-05-2014 04:33 PM
I agree with Jaap Karman too.
Reusing the same data set name throughout your program can also make it harder for anyone else to pick up your code and understand the data flows and what the program is trying to acheive.
05-05-2014 03:28 PM
I prefer to use different data sets to allow for easier debugging, but then I clean up my process at the end, especially for big jobs.
proc datasets, proc delete and proc sql can all delete tables.
In addition, I use a naming convention for temporary data sets such as temp_1 temp_2 then you can refer to them as temp_: in proc datasets to delete all temp_ datasets.