07-15-2017 10:32 PM
Our group is trying to manage disk space consumption on the server on which we run SAS.
The question is: Is there a risk that nodes (particularly input data nodes) may be removed from Enterprise Miner project flowcharts if the underlying SAS datasets on the application server are deleted? Or are the nodes retained in the flowchart?
07-20-2017 05:04 PM
Which "underlying datasets" are you contemplating removing from the application server?
If they are the datasets that are used as inputs to your models (source data that is pointed to by a Data Source/Input Dataset node), all of the nodes will remain in your diagram(s), as will the results. If you want to re-run your analyses at another time, simple restore the source data sets.
If, however, you mean the data sets that are part of the Enterprise Miner projects themselves, the nodes will remain in your diagram, but depending on what's deleted, the diagram(s) may no longer run and may no longer have results, and could even become corrupt.
07-20-2017 05:58 PM
To summarize what others have said, deleting data sources outside of the SAS Enterprise Miner project folder will have no effect on a process flow or diagram. It will affect your ability to run the process flow. Existing results will still be available after a data source is deleted, but the results will be lost if you attempt to re-run a process flow with a missing data source. Deleting data sets within the SAS Enterprise Miner project folder without using SAS Enterprise Miner to do so can cause anything to happen: corrupted project, corrupted process flow, corrupted diagram. So don't do that.
If disk limitations are so severe that you want to try to manage disk resources without using Enterprise Miner for deletions, then I strongly recommend that you back up diagrams using the "save as" feature, which allows you to save a diagram as an XML file. I do so routinely for all of my Enterprise Miner (EM) projects. If anything is corrupted or lost, you can import the XML file to reproduce the diagram. You will have to rerun the process flow to get results. If you are fearful of losing results, I suggest using the Reporter node. One approach I use to preserve disk space is to create my full diagram with explorations and false starts and side trips, then save the XML file, save the PDF files from the various Reporter nodes, and then use EM to prune the process flow. When you prune the process flow by deleting nodes, all of the files (data sets, catalogs, XML files) associated with the node will be deleted. Remember, there is no UNDO button!!!
One last amplification. If you delete a data source, you may lose the metadata for that data source. Consequently, many EM users like to use the Input Data Source node to create metadata, rather than using the New Data Source wizard. If you save an XML file for a diagram with metadata fully specified using an Input Data Source node, you can easily re-create the metadata just by re-running the node. Furthermore, you can share the node, saved as part of an XML file, with other users, so they don't have to re-create the metadata. This is handy if you don't use the SAS Management Console or some other tool for managing corporate data sources.
Finally, use the Diagrams Notes property. The Notes are priceless when you attempt to refresh or rebuild a predictive model 12 months later. Why did you leave out a certain input? Did you try to use a certain model and then delete it, and if so, why did you delete it? The Diagrams History property can save the day if you fail to use the Notes property, because it shows every addition and deletion for the life of the diagram. The Notes and History get saved as part of the saved XML file.
07-20-2017 06:44 PM
One additional thought to share.
As Terry points out, saving diagrams as XML is an excellent practice. It preserves all of the nodes, their properties and any notes that you've included. One thing that it does not include is the results.
If you'd also like a way to capture the results, creating model packages as well as saving digrams as XML will be invaluable. Model packages include the diagram XML, making it very straightforward to recreate the diagram, and model packages also include results from each node and score code if you've used a Score node. Model packages also allow your model to be used in other SAS software.
For instructions on creating model packages, please see this SAS Usage Note 46764.