03-13-2013 10:10 AM
please help with below questions..
SAS has its own storage for data.
But is SAS good enough for huge data storage ( terabytes of data)?
Does performance get impacted when processing huge data thats stored in SAS?
(its learnt that nowadays organizations(for performance improvement) tend to replace SAS storage with some other databases(like Teradata , DB2 etc )...... can that be true?
03-13-2013 10:26 AM
SAS (although it is often used as synonyms for BASE SAS) is a huge suite of software. SPDS is the SAS component that dealing with "Big" data, if you can afford it.
03-13-2013 10:33 AM
SAS can talk to Oracle, SQL, DB2 and others.
I'm confused as to what you mean by 'SAS Storage' though. Are you talking about the file format or where data is stored.
A SAS dataset can get very large and then you need someplace to put it, usually terabytes aren't stored on desktops, so servers then to be the next step. A server is usually of the flavour Oracle, SQL or DB2 or others I'm sure. I'm not aware of any limit on the size and since SAS processes things line by line efficiency doesn't usually decrease, but things do take longer.
03-13-2013 10:35 AM
But is SAS good enough for huge data storage ( terabytes of data)? - it will work..but not optimally
Does performance get impacted when processing huge data thats stored in SAS? - yes
(its learnt that nowadays organizations(for performance improvement) tend to replace SAS storage with some other databases(like Teradata , DB2 etc )...... can that be true? - yes..
03-13-2013 11:52 AM
Agree with DBailey.
I think generally read performance works quite good with both Base and SPDE tables.
The problem is data management - updating and reorganizing data. Keeping indexes and constraints, phase out data, partitioning etc. In those aspects other RDMS have some advantage.
03-14-2013 09:12 PM
I think the most important factor is how you're going to use the data. If you're going to always use all of the columns and all of the rows of a dataset, it doesn't matter what you use, you'll have performance challenges. In this case, you might as well use "native" SAS datasets, as adding a DBMS to the mix will simply add to the processing. However, this case is fairly rare.
A very common case in the IT world is the "relational OLTP" (for OnLineTransactionProcessing) DBMS option. This is ideal for cases like banks, airline reservations, and most commercial and administrative requirements. In this case, you might have a HUGE number of records in a table (e.g. all of the customers in Bank of America, one of the largest banks), but when I do a banking transaction, I only need the data for my record. Therefore, with appropriate keys, very large databases can perform extremely well, even if tens of thousands of different customers are accessing their accounts concurrently. Products like Oracle, SQL Server, DB2 are optimized for this use case, and they are excellent. My favourite description of a database like this is that it "twinkles", because the different cells in it are being accessed more or less randomly more or less constantly.
However, my experience with SAS is that it tends to be used for reporting and analytics, which is quite a different use case. Imagine that I am working with Census data, and I would like to compute average income, by age and sex. In this case, my program needs to access EVERY ROW to gain my results. Rather than many very small transactions, statistical work tends to need a very large number of or all rows to satisfy requirements, but usually only needs a few columns.
To meet this requirement, vendors have produced data management products that, while they look like regular DBMS products in that they use SQL, are highly optimized for the statistical case, using a number of techniques.
Here's a link to a note I drafted a while ago that discusses this in more detail.
Hope this helps,