07-07-2015 12:23 PM
SAS can store its data in default SAS storage in the form of SAS tables or in RDBMS.
1. Are there any reasons to store SAS data in RDBMS, such as Oracle database?
2. What is the disadvantage for storing SAS data in RDBMS?
Thanks in advance for any comments.
07-07-2015 01:21 PM
The main reason I would think of putting SAS data into an external RDBMS is because other users need it and that's where the rest of the data for a project resides. SAS does many things that most databases can't so you pull the data and provide results that are then useable in the other application.
Disadvantage from a SAS programmer point of view is basically IO, either (re)creating SAS data sets for analysis and overhead, OR complexity in dealing with native DBMS "features" such as very long variable names.
07-07-2015 01:51 PM
Very open question, there are tons of pros and cons. So the interesting is what applies to you?
What is your requirements? What kind of data do you have? Who/how will you load it?
What applications/users will explore the data?
How does the competence at the site look like?
Is there a detailed IT strategy (platform wise)?
07-07-2015 02:24 PM
As ballardw said: if it needs to be shared, then a DBMS makes a lot of sense, and IO is always a problem.
We run sas against datasets with hundreds of millions of observations and it is shared data, so it is in an RDBMS. If I am very careful and make appropriate limitations in where clauses and there are enough indexes on the data on the server, then I can greatly shorten the amount of time I have for the basic data to come down to my workstation. A large server can be very beneficial in cutting your query time, but you have to program to take advantage of the resources of the server to make it work for you.
On the other hand, every time I want data for any type of local processing, I have to wait. I also have to manipulate the data from a normalized form into a wide table format, dataset, so the process I want to do can be carried out effectively; he basic IO for this may be hours during my process run. Realistically though, any time I have to get data from a dataset not on my machine, I have to wait on network traffic. So even if the data is not on the server, then I am still going to have IO across the network from a repository of our SAS datasets. Network based data actually takes more time than requesting data directly from the RDBMS.
Those are the pros and cons I work with on a daily basis, there are many more, but these are the major ones I work with every day.
07-07-2015 03:57 PM
Most SAS projects I work on use a mixture of both:
Where data needs to be shared with a wider audience then having it in an RDBMS makes sense.
Where you are doing a lot of processing it is far more efficient to use SAS datasets.