Hi. When I have a SAS program open and I try to open another one, SAS has to close down the original program (i.e. it can only open one at a time). Is there a setting for this? Its just one of those niggles...
In Enterprise Guide (EG) you can have multiple code blocks/segments "open" at the same time for editing, within a project. BUT, you can only have one project open per EG instance. To have multiple projects open -- human multi-tasking -- you have to have multiple EG's running, which is a quick drain on memory.
I have raised this issue many times with SAS, and am waiting for EG 5 to see if they do something about it. When the #2 guy came to the company I was at, I spoke to him about the lack of ability to work on multiple things at the same time. When I was used to review some user interface features for EG 5, I raised and demonstrated the issues. I generally work with large data sets that take many minutes and sometimes > 1 hour to process. The serialness of EG got in the way of my productivity because it doesn't support multi-tasking well. I need to be able to bring in multiple data sources into a project, run concurrent selection queries against them, so that I can then join the smaller selected sets together to continue the analysis. Sometimes the analysis tracks, within an adhoc project, are complete separate, and don't need joining. But EG will do only one at a time.
I believe EG needs to take better advantage of SAS/CONNECT and the use of asynchronous rsubmits to run stuff out of a project. I guess, I may be talking about a form of grid computing, even if on only one machine. After all, if I am writing SAS code in notepad and running multiple SAS programs simulateously the hard way, I should be able to do that within EG. But I can't, so I don't, and I still do a lot of development the old hard way so that I can have multiple things running at the same time.
I think the problem is sourced in that a lot of statistics/statistians ares related to the social sciences where you deal primarily with small datasets, perhaps at most a few thousand "observations". But most of us in the computer performance analysis, and finanicial analysis arena work with > 1,000,000 records on a regular basis. In my last job, the daily number of records to be processes were > 1,000,000,000, and we had > 20 people who worked with that data set.
Each "task" in a project, when run, should run as an asyncrhonous task, with its own SAS session on the server for that run. If a set of tasks are part of a flow, then EG should manage that on its own, separately. Also, a task should be able to be given the property of "autoexec" for when a project is opened.
Finally, just like I can request a "Query and filter..." I should be able to request a "Data step" for a dataset, where it preloads the DATA...; and SET...; statements for me, and all I have to do is write the code inbetwen that and the run; I know I can open and write a code block, but I can write SQL too, so why is there a dialog box for queries?
Chuck - have you tried "code abbreviation" In a sas code node(must be open). from the main menu choose code...add abbreviation.(call it something like data) Then cut and paste you model code...and voila....really very simple.
My problem is quite simple. If I have a project open and I want to open another project, I have to open another session of EG and then in this session, open the project. What my colleagues can do is, in the same project, they can open another project in the same session or they can just double click on the project in windows explorer. If I do this, EG has to close down my other project first. Any ideas? Thanks.
I hear you. We are looking at ways to do this better in future releases, taking more advantage of SAS/Connect and/or a gridded computing environment.
Right now, EG has an architectual limitation that it can have just one connection to a particular SAS server (for example, "SASMain"). These connections are inherently serial -- we cannot submit a job to an existing session if there is still a job running. However, we can submit a job to another session, using a server of a different name, and run it concurrently. That second server (for example, "SASMain2") can be the same host as the first, with access to all of the same resources. (But it must be addressed using another port in the object spawner setup.)
Now I think this same kind of thing should be able to be accomplished in a single EG project. When I open a library and select a dataset, EG enters that data "node" into the project, against the default server, ok fine. Now I want to submit a query against that dataset. I believe that query should be wrapped in a session that rsubmits the query asynchronously to either the remote or local server.
Part of my $SASScheduledJobs and %SynchronousBI code is to automatically issue an %include "...\IncludeLib\LibraryDefinitions.sas" so that all of our standard libraries, filename definitions and options are created in the new session. In my logical definitions of 4 logical servers, I have as "options"
So that these things are also set for the remote server on connection.
The environment is actually PROD, DEV or USER depending on the logical server and the intended environment.
I used to have the LibraryDefinitions.sas in the "Tools : Options ... : SAS Programs : Insert custome SAS code before submitted code "
Now I suppose with proper metadata work, this kind of stuff shouldn't be necessary. At a previous place, it wasn't.
The point is, there appears to be the technology available to do what I desire, for queries to be asynchronously submitted to a remote server session, but it will require a different perspective for EG and what it is supposed to do and how it is supposed to do it. In my opinion, EG should be a java (not .net) application for portability that is a non-SAS interface to multiple underlying SAS servers, and only creates and uses a server session when it actually submits code/requests.
To help with this, there should be a "Data Step" object -- not the same thing as a "code node". I select a dataset, and then select "Data Step" and it opens a dialog, but I don't have to enter/write the "data ... ;", "set ...;", "run; quit;" statements. Just like with the "Query ..." those are provided for me, all I have to do is enter the processing code I want executed in the step. Why? because EG knows the input dataset, and I have already defined that I want all my output to go to EGTASK. The Assign Libname ... for EGTASK was already entered into the project as, but is not actually executed. It is automatically included into the submitted code, after the rsubmit, so that it exists for inputs and outputs.
"Connecting" to a remote server -- that is accessing it through the server list -- should only be a temporary session to pull the library information for that server and cache it in/for EG, same for the "File" tree. The connection/session goes away after the information is retrieved. If I then select that I want to use a specific dataset, a new syncrhonous signon "session" with an asynchronous rsubmit reads some of the data and sends the result set to EG for display.
Perhaps SAS startup is very expensive. Perhaps a new multi-threaded architecture could be designed for SAS and we could have a SAS daemon/service running on the remote server that greatly simplifies the communication of information and running of SAS code. After all, SAS is a JIT compiled environment, a predecessor to Java, perhaps an internal redesign along the ideas of multi-threading, multi-tasking, object orientation, and grid processing would be a great thing. SAS is a system, not just a language. It's goal is to provide simple accessible computing power to analysts, hiding as many of the incidental details as it can.