08-04-2013 02:08 PM
I wonder if someone could give me some suggestion on dealing with many intermediate data sets in a big SAS project using SAS enterprise guide. In order to build a targeted data sets, there might need some intermediate data sets when using "DATA dataset1;SET dataset;". If saving many intermediate data sets (such as dataset0, -dataset10) in the work library, the project will seem messy. But if giving intermediate data sets the same name again and again, sometimes SAS log will give a warning message about the risks of rewriting the same data set repeatedly.
So, may I know the better way of handling or naming the intermediate data sets in temporary library? Thank you!
08-04-2013 05:39 PM
"sometimes SAS log will give a warning message about the risks of rewriting the same data set repeatedly"
I have never seen this warning. Is this something EG generates as it appears to me "SAS" as such doesn't care and I couldn't replicate it with below code.
%do i=1 %to 100;
If you have enough work space then I would use individual names for your intermediary datasets (so not re-using them) as this allows best for de-bugging and checking data after every single node at any time.
What you could do is to use some naming convention for your intermediary data sets, eg. "int_<name>". To clean out data sets you could then use in a code node at the beginning and/or end of your project somthing like:
proc datasets lib=work nolist nowarn;
There is also a menu in EG under "Task / Data / Delete Tables and Formats.." which you could use, but: this wizard creates SQL DROP Table statements which require tables to actually exists and you also can't use the ':' wildcard.
Proc Datasets with Nowarn is more flexible here and won't throw a warning even if you execute it without any work.int_<name> table existing.
08-04-2013 08:35 PM
You might be getting this warning message if you are writing or converting an EG SQL query and trying to reuse the name. SAS SQL is less tolerant of name reuse than data step processing.
In earlier versions of EG (at least) there was a problem with deleting intermediate datasets in a process flow (which I assume you are using) because when the process is opened again in a new session EG checks that the tables exist and will throw errors if they are not available. This would necessitate using SQL "drop from xxx" rather than delete to conserve space. I have not experienced this problem in 4.3, but "your mileage may vary".
Because EG relies on SQL for its queries rather than data step processing, and because the query builder does not support subqueries, a lot of intermediate tables can be created in a process, which may be at the root of your problem. Setting the query to produce a view rather than a table for a once-used dataset can at least save space.
If however you are relying on program code nodes, which can result in generating multiple intermediate tables which just clutter up your process flow because they do not contribute directly to subsequent queries in the process, then having a proc datasets or proc sql cleanup within each program rather than at the end of the process might be the way to go.
08-05-2013 07:51 AM
Partrick and Richard,
Thank you both for your suggestions! The information you provided is very helpful. I really appreciate it.