SAS grid as a Data Warehouse?

Accepted Solution Solved
Reply
Contributor
Posts: 40
Accepted Solution

SAS grid as a Data Warehouse?

We were asked to move all the SQL server / teradata data to SAS grid to make the grid as a Data warehouse.

I can understand this is possible, tried looking online if there are any examples, but haven't found any.

We will have about 200GB data total to move and have 1TB size allocated in the grid. I also noted SAS takes larger space than the Teradata table when I tried a 10GB size table to SAS.

I would like to hear from the experts here:

1. Whether or not SAS grids can be used as a Data warehouse? Any advantages / disadvantages of using SAS grid to store and access data for reporting / analytics purpose?

2. How do you move the data to SAS tables from teradata/sql server quicker (any fastload / bulkload options?)?

3. Why does SAS takes more space than teradata / sql server for the same data (10GB vs. 50GB)?

Thanks to all for your insights!

Park


Accepted Solutions
Solution
‎05-25-2015 02:59 PM
Super User
Posts: 5,256

Re: SAS grid as a Data Warehouse?

  1. Perhaps. But you need a use case. If someone already asked for this, they probably have some motivation? Typically if you have many concurrent medium-large sessions/jobs going on, Grid could come into question. On the other hand, multi core chips is becoming cheaper, so perhaps we are going back to more centralized servers over time. Bottom line is, you can do any conclusion by asking the forum. You need on site analysis.
  2. First, you don't have to move to SAS (unless you wish to save SW licenses). There are examples where SAS can access Teradata installations on the grid, and perform well from what I understand. But, is this a regular job, or one time dump? SAS doesn't have any fast/bulk-load options, since SAS is very light-weight storage system. If you rely heavily on indexing, chose SPDE or evaluate SPD Server - they are much better when it comes to incremental updates on indexes.
  3. Your example seems very specific, and you need to attach more information. I guess that Teradata by default compresses data, and also have VARCHAR data types. If your tables are CHAR intensive, try compressing them in SAS as well. If they are NUM intensive, use COMPRESS=BINARY. Numerical columns can be trimmed manually by choosing less than the default 8 byte. But just for integers! Check the numerical precision table in the SAS Companion for your SAS Grid hosts.
Data never sleeps

View solution in original post


All Replies
Solution
‎05-25-2015 02:59 PM
Super User
Posts: 5,256

Re: SAS grid as a Data Warehouse?

  1. Perhaps. But you need a use case. If someone already asked for this, they probably have some motivation? Typically if you have many concurrent medium-large sessions/jobs going on, Grid could come into question. On the other hand, multi core chips is becoming cheaper, so perhaps we are going back to more centralized servers over time. Bottom line is, you can do any conclusion by asking the forum. You need on site analysis.
  2. First, you don't have to move to SAS (unless you wish to save SW licenses). There are examples where SAS can access Teradata installations on the grid, and perform well from what I understand. But, is this a regular job, or one time dump? SAS doesn't have any fast/bulk-load options, since SAS is very light-weight storage system. If you rely heavily on indexing, chose SPDE or evaluate SPD Server - they are much better when it comes to incremental updates on indexes.
  3. Your example seems very specific, and you need to attach more information. I guess that Teradata by default compresses data, and also have VARCHAR data types. If your tables are CHAR intensive, try compressing them in SAS as well. If they are NUM intensive, use COMPRESS=BINARY. Numerical columns can be trimmed manually by choosing less than the default 8 byte. But just for integers! Check the numerical precision table in the SAS Companion for your SAS Grid hosts.
Data never sleeps
Contributor
Posts: 40

Re: SAS grid as a Data Warehouse?

Thanks for the insights. Here is some additional details:

First, you don't have to move to SAS (unless you wish to save SW licenses). There are examples where SAS can access Teradata installations on the grid, and perform well from what I understand. But, is this a regular job, or one time dump? SAS doesn't have any fast/bulk-load options, since SAS is very light-weight storage system. If you rely heavily on indexing, chose SPDE or evaluate SPD Server - they are much better when it comes to incremental updates on indexes.


The department already got the license for SAS and all. They are moving to SAS for a few reasons, cost (~$2,500/yr for SAS vs. ~$10,000/yr for Teradata) to store 1TB data and thinking data locality will reduce the overhead for reporting purposes using Web reporting studio or similar web interface for the users. We were asked to source the teradata /sql server tables to create ETL processes running on a daily basis that creates mini data marts in SAS along with a one time move of the hist tables containing 30-40 million rows (~10 tables like this). I argued about the network I/O and other issues with the formats that could be faced, and trying to do more research on the process. I tried using views, but at the end it is taking similar time as using table to perform some ETL for creating data marts. Seems like it is going to be more challenging than everyone thinks.


In terms of space issue, I will get the details later this week and post, but understand from both reply that it needs lot of tuning and manual intervention. I will have to study that items Jaap mentioned, will get back later. Thanks!

Super User
Posts: 5,256

Re: SAS grid as a Data Warehouse?

Like already mentioned, the data volume itself doesn't call for a grid solution. With a nice design it should be possible to handle in a single node. Especially if you talk about WRS users.

If you have SAS DI Studio licensed, the easiest is to build SAS ETL jobs that extract the required data from Teradata/SQL Server.

Network issues shouldn't be a problem if the shared file system is configured the proper way. There were some very useful papers from SAS Global Forum this year.

Data never sleeps
Valued Guide
Posts: 3,208

Re: SAS grid as a Data Warehouse?

1/ A grid is a hadoop like approach. You have many machines each supporting some part of the data.
Having a SAN connected to all machines you are introducing the problem of enqueing/locking/synchronization. To solve that  you have to build something like Hapdoop or Teradata. 
http://blogs.sas.com/content/sgf/2015/01/14/sas-grid-manager-shared-nothing-storage-and-hadoop-part-...   http://blogs.sas.com/content/sgf/2015/04/27/can-i-run-sas-grid-manager-in-the-aws-cloud/

With 200Gb and a total size of 1Tb we are more in the area a single machine (compute tier) could do that well. It is approaching a common desktop capacity.

2/ By default data with SAS is not compressed. SQL-Server and Teradata on the other hand are having those options. With Terada you can have a grid approach (by-net) and hardwarecompression.

SAS datasets are optimized for sequential processing and it shares some of the principles of a DBMS using blocks where the records are stored in.
There are two types of compression with SAS that is a/ RLE (Run length) also named "char" and RDC (RossDataCompression) also known as "binary" SAS(R) 9.4 Statements: Reference, Third Edition (Libname statement)

Having long records (many variables) the binary approach is better. It is adding overhead in a similar way zip is doing.  The char options will do having shorter records but with chars having a lot of same (eg spaces) chars.

The binary name is misleading it is not for numerics only but handles the whole sas-record.

3/ There is a lot of going on with SAS datasets on the OS level. Using those you must tune them. Directio (not using system cache)  Bufno SASaligniofiles bufsize can have big impact on the performance. Nothing different as a DBA would do at his DBMS, but you have to do that now yourself. Whether a SPDS server would give additional benfits is questionable.

---->-- ja karman --<-----
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 589 views
  • 5 likes
  • 3 in conversation