BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Venkat4
Quartz | Level 8

We were asked to move all the SQL server / teradata data to SAS grid to make the grid as a Data warehouse.

I can understand this is possible, tried looking online if there are any examples, but haven't found any.

We will have about 200GB data total to move and have 1TB size allocated in the grid. I also noted SAS takes larger space than the Teradata table when I tried a 10GB size table to SAS.

I would like to hear from the experts here:

1. Whether or not SAS grids can be used as a Data warehouse? Any advantages / disadvantages of using SAS grid to store and access data for reporting / analytics purpose?

2. How do you move the data to SAS tables from teradata/sql server quicker (any fastload / bulkload options?)?

3. Why does SAS takes more space than teradata / sql server for the same data (10GB vs. 50GB)?

Thanks to all for your insights!

Park

1 ACCEPTED SOLUTION

Accepted Solutions
LinusH
Tourmaline | Level 20
  1. Perhaps. But you need a use case. If someone already asked for this, they probably have some motivation? Typically if you have many concurrent medium-large sessions/jobs going on, Grid could come into question. On the other hand, multi core chips is becoming cheaper, so perhaps we are going back to more centralized servers over time. Bottom line is, you can do any conclusion by asking the forum. You need on site analysis.
  2. First, you don't have to move to SAS (unless you wish to save SW licenses). There are examples where SAS can access Teradata installations on the grid, and perform well from what I understand. But, is this a regular job, or one time dump? SAS doesn't have any fast/bulk-load options, since SAS is very light-weight storage system. If you rely heavily on indexing, chose SPDE or evaluate SPD Server - they are much better when it comes to incremental updates on indexes.
  3. Your example seems very specific, and you need to attach more information. I guess that Teradata by default compresses data, and also have VARCHAR data types. If your tables are CHAR intensive, try compressing them in SAS as well. If they are NUM intensive, use COMPRESS=BINARY. Numerical columns can be trimmed manually by choosing less than the default 8 byte. But just for integers! Check the numerical precision table in the SAS Companion for your SAS Grid hosts.
Data never sleeps

View solution in original post

4 REPLIES 4
LinusH
Tourmaline | Level 20
  1. Perhaps. But you need a use case. If someone already asked for this, they probably have some motivation? Typically if you have many concurrent medium-large sessions/jobs going on, Grid could come into question. On the other hand, multi core chips is becoming cheaper, so perhaps we are going back to more centralized servers over time. Bottom line is, you can do any conclusion by asking the forum. You need on site analysis.
  2. First, you don't have to move to SAS (unless you wish to save SW licenses). There are examples where SAS can access Teradata installations on the grid, and perform well from what I understand. But, is this a regular job, or one time dump? SAS doesn't have any fast/bulk-load options, since SAS is very light-weight storage system. If you rely heavily on indexing, chose SPDE or evaluate SPD Server - they are much better when it comes to incremental updates on indexes.
  3. Your example seems very specific, and you need to attach more information. I guess that Teradata by default compresses data, and also have VARCHAR data types. If your tables are CHAR intensive, try compressing them in SAS as well. If they are NUM intensive, use COMPRESS=BINARY. Numerical columns can be trimmed manually by choosing less than the default 8 byte. But just for integers! Check the numerical precision table in the SAS Companion for your SAS Grid hosts.
Data never sleeps
Venkat4
Quartz | Level 8

Thanks for the insights. Here is some additional details:

First, you don't have to move to SAS (unless you wish to save SW licenses). There are examples where SAS can access Teradata installations on the grid, and perform well from what I understand. But, is this a regular job, or one time dump? SAS doesn't have any fast/bulk-load options, since SAS is very light-weight storage system. If you rely heavily on indexing, chose SPDE or evaluate SPD Server - they are much better when it comes to incremental updates on indexes.


The department already got the license for SAS and all. They are moving to SAS for a few reasons, cost (~$2,500/yr for SAS vs. ~$10,000/yr for Teradata) to store 1TB data and thinking data locality will reduce the overhead for reporting purposes using Web reporting studio or similar web interface for the users. We were asked to source the teradata /sql server tables to create ETL processes running on a daily basis that creates mini data marts in SAS along with a one time move of the hist tables containing 30-40 million rows (~10 tables like this). I argued about the network I/O and other issues with the formats that could be faced, and trying to do more research on the process. I tried using views, but at the end it is taking similar time as using table to perform some ETL for creating data marts. Seems like it is going to be more challenging than everyone thinks.


In terms of space issue, I will get the details later this week and post, but understand from both reply that it needs lot of tuning and manual intervention. I will have to study that items Jaap mentioned, will get back later. Thanks!

LinusH
Tourmaline | Level 20

Like already mentioned, the data volume itself doesn't call for a grid solution. With a nice design it should be possible to handle in a single node. Especially if you talk about WRS users.

If you have SAS DI Studio licensed, the easiest is to build SAS ETL jobs that extract the required data from Teradata/SQL Server.

Network issues shouldn't be a problem if the shared file system is configured the proper way. There were some very useful papers from SAS Global Forum this year.

Data never sleeps
jakarman
Barite | Level 11

1/ A grid is a hadoop like approach. You have many machines each supporting some part of the data.
Having a SAN connected to all machines you are introducing the problem of enqueing/locking/synchronization. To solve that  you have to build something like Hapdoop or Teradata. 
http://blogs.sas.com/content/sgf/2015/01/14/sas-grid-manager-shared-nothing-storage-and-hadoop-part-...   http://blogs.sas.com/content/sgf/2015/04/27/can-i-run-sas-grid-manager-in-the-aws-cloud/

With 200Gb and a total size of 1Tb we are more in the area a single machine (compute tier) could do that well. It is approaching a common desktop capacity.

2/ By default data with SAS is not compressed. SQL-Server and Teradata on the other hand are having those options. With Terada you can have a grid approach (by-net) and hardwarecompression.

SAS datasets are optimized for sequential processing and it shares some of the principles of a DBMS using blocks where the records are stored in.
There are two types of compression with SAS that is a/ RLE (Run length) also named "char" and RDC (RossDataCompression) also known as "binary" SAS(R) 9.4 Statements: Reference, Third Edition (Libname statement)

Having long records (many variables) the binary approach is better. It is adding overhead in a similar way zip is doing.  The char options will do having shorter records but with chars having a lot of same (eg spaces) chars.

The binary name is misleading it is not for numerics only but handles the whole sas-record.

3/ There is a lot of going on with SAS datasets on the OS level. Using those you must tune them. Directio (not using system cache)  Bufno SASaligniofiles bufsize can have big impact on the performance. Nothing different as a DBA would do at his DBMS, but you have to do that now yourself. Whether a SPDS server would give additional benfits is questionable.

---->-- ja karman --<-----

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1594 views
  • 5 likes
  • 3 in conversation