BookmarkSubscribeRSS Feed
liaomin
Calcite | Level 5

SAS can store its data in default SAS storage in the form of SAS tables or in RDBMS.

Questions:

1. Are there any reasons to store SAS data in RDBMS, such as Oracle database?

2. What is the disadvantage for storing SAS data in RDBMS?

Thanks in advance for any comments.

6 REPLIES 6
ballardw
Super User

The main reason I would think of putting SAS data into an external RDBMS is because other users need it and that's where the rest of the data for a project resides. SAS does many things that most databases can't so you pull the data and provide results that are then useable in the other application.

Disadvantage from a SAS programmer point of view is basically IO, either (re)creating SAS data sets for analysis and overhead, OR complexity in dealing with native DBMS "features" such as very long variable names.

liaomin
Calcite | Level 5

Thanks for the helpful information.

liaomin
Calcite | Level 5

Thanks everyone for the helpful information.

LinusH
Tourmaline | Level 20

Very open question, there are tons of pros and cons. So the interesting is what applies to you?

What is your requirements? What kind of data do you have? Who/how will you load it?

What applications/users will explore the data?

How does the competence at the site look like?

Is there a detailed IT strategy (platform wise)?

Backup/restore routines...

Data never sleeps
morgalr
Obsidian | Level 7

As ballardw said: if it needs to be shared, then a DBMS makes a lot of sense, and IO is always a problem.

We run sas against datasets with hundreds of millions of observations and it is shared data, so it is in an RDBMS. If I am very careful and make appropriate limitations in where clauses and there are enough indexes on the data on the server, then I can greatly shorten the amount of time I have for the basic data to come down to my workstation. A large server can be very beneficial in cutting your query time, but you have to program to take advantage of the resources of the server to make it work for you.

On the other hand, every time I want data for any type of local processing, I have to wait. I also have to manipulate the data from a normalized form into a wide table format, dataset, so the process I want to do can be carried out effectively; he basic IO for this may be hours during my process run. Realistically though, any time I have to get data from a dataset not on my machine, I have to wait on network traffic. So even if the data is not on the server, then I am still going to have IO across the network from a repository of our SAS datasets. Network based data actually takes more time than requesting data directly from the RDBMS.

Those are the pros and cons I work with on a daily basis, there are many more, but these are the major ones I work with every day.

SASKiwi
PROC Star

Most SAS projects I work on use a mixture of both:

Where data needs to be shared with a wider audience then having it in an RDBMS makes sense.

Where you are doing a lot of processing it is far more efficient to use SAS datasets. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 778 views
  • 7 likes
  • 5 in conversation