BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PhilipH
Quartz | Level 8

Dear SAS-Experts,

My organization has been using SAS Enterprise Guide for a few years now and we finally got our SAS Server up and running. SAS DI Studio, Web Report Studio, etc.
I don’t have the server specs ready but what the IT guys tell me, it is good.
The big question for me as a data manager is if we can now trash our Oracle DWH with annually 15m records and just move the data to SAS. How is SAS performing with large data set e.g. 150m rows over 10 years’ worth of data. The 15m rows annually are basically created by appending two tables from the Oracle transactional system into a large fact table. Where should we do this data extraction and merging, in the Oracle environment (PL/SQL) or using SAS DI Studio and PROC SQL/SAS BASE. Which is performing better? I know I can’t offer enough technical details to solicit a 100% water tight recommendation but could you maybe point out what is best-practice?

Bye

Philip

1 ACCEPTED SOLUTION

Accepted Solutions
LinusH
Tourmaline | Level 20
Don't be sceptical about DAD capabilities when it comes to cubes marts and other types analytical and reporting storage. It's great.
When it comes to the question about moving the whole DW stack to SAS we just know too little about your current Oracle deployment anD the requirements that you have. But SAS is a very capable sw and I think in the end it depends on the competence of your SAS architect and ETL developers.
Just as a hint I have colleges that are running a 13TB telco DW w 100% SAS, and they're doing gr8.
Data never sleeps

View solution in original post

12 REPLIES 12
Doc_Duke
Rhodochrosite | Level 12

Philip,

 

I would do the join in Oracle.  I would start by creating it as a view that is limited to only the columns that are needed by SAS.  That will minimize the disk storage requirements and the amount of data to be transferred across the network from the Oracle server to the SAS server (they are often on separate hardware platforms).  You can treat the Oracle view as a SAS dataset and not copy the file as a first step.

PhilipH
Quartz | Level 8

We just tried it with 50m records via proc sql using the Oracle-DB but Oracle dropped out: ORA-12801 P031 😕

 

Do you have an idea how to only load the DELTA from the Oracle Transaction System in to the SAS data set? Oracle has Materialized Views to only add the actual changes made. How about with SAS DI? I don’t want to grab the whole lot of data every day and it would be great if the delete/update/insert is done automatically. I read about "Change Data Capture" which would require that we introduce a kind of change-log table. Something which is not yet done using Oracle Materialized Views.

 

LinusH
Tourmaline | Level 20
As for the join/merge question - this will be easy to test once your environment is ready.

150m rows sounds feasible at least from loading point of view. Query performance is harder to predict.
If you need to optimize star schema queries you might want to consider SAS SPD Server which also have nice tools for logical partitioning, e.g. by year.
Data never sleeps
SASKiwi
PROC Star

What is the business case for moving your Oracle DW to SAS? I suggest the justification would have to be more than a performance gain, if any.

PhilipH
Quartz | Level 8
Hi Doc_Duke,
Thanks for the feedback.

@SASKiwi, we got our transactional system based on an Oracle DB Standard Edition.
I don't know much about the DB server hardware. I think some in our organization hope to reduce the burden on the Oracle DB. The business case is that everyone is excited about SAS and we think that we can dismantle our existing Oracle DWH and instead have it all in SAS. I have an Oracle background so I am skeptical if SAS can handle building OLAP cubes or DWH-subsets i.e. Data Marts for marketing, HR, management if we have a 150m observation SAS table that is joined/combined with, let’s say, 10 dimension tables.
As far as I understood, Doc_Duke suggested doing the joins in Oracle and exposing ready-made data mart views to SAS.
SASKiwi
PROC Star

In that case I suggest you start small by trialling one or two data marts in SAS to get a feel for how it performs with your reporting. A lot of SAS sites are now bypassing OLAP and cubes and are using the new in-memory SAS VA technology. We are about to do this ourselves. It appears to be a lot easier than having to pre-guess dimensions and facts with cubes.

LinusH
Tourmaline | Level 20
Don't be sceptical about DAD capabilities when it comes to cubes marts and other types analytical and reporting storage. It's great.
When it comes to the question about moving the whole DW stack to SAS we just know too little about your current Oracle deployment anD the requirements that you have. But SAS is a very capable sw and I think in the end it depends on the competence of your SAS architect and ETL developers.
Just as a hint I have colleges that are running a 13TB telco DW w 100% SAS, and they're doing gr8.
Data never sleeps
PhilipH
Quartz | Level 8
Hi LinusH,
do you have an idea how to only load the DELTA from the Oracle Transaction System in to the SAS data set? Oracle has Materialized Views to only add the actual changes made. How about with SAS DI? I don’t want to grab the whole lot of data every day and it would be great if the delete/update/insert is done automatically.
LinusH
Tourmaline | Level 20
SAS capabilities it should read. Darn autocorrect...😬
Data never sleeps
LinusH
Tourmaline | Level 20
I think that there is another active thread, perhaps yours? And I think the answer there was accurate. Bottom line is: you need a way to extract changed records from Oracle. That done, you have several options in DIS depending on your requirements and your physical data model.
Data never sleeps
PhilipH
Quartz | Level 8

Hi,

you are right. The delta problem is addressed somewhere else..

Bye

 

RedPlanet
Obsidian | Level 7

Any answer to performance questions has to consider the architecture of the SAS host, the architecture of the database, and the speed of the network connections.  Also you need to look at the type of data, and how much of that data has variable length character fields.

 

Systems and databases can be grid or MPP, which means that they have many processors running in a massively parallel manner.  These systems increase performance by 10 to 100 or more times, compared with single processing systems.  Also the design of the database affects performance, so it's difficult to compare one database in third normal form and to another may non-normal database.  But again, it depends on the queries that run on those databases, since most all databases have an issue where a few bad queries can slow down everyone.

 

Character data is a potential problem since SAS does not have true variable length data (maybe DS2 will make a liar out of me), but databases do have efficient storage for character data.  Also character keys are slower and less efficient than numeric keys, which makes a difference when you are deciding to store BIGINT data as characters in a SAS dataset.

 

Getting down to performance in general, a SAS grid or an MPP is always better if you can afford it.  Whether you have a grid or not, you still need to be concerned about data movement over any network, which is why some programs may take a day or so to finish. Your best bet for performance is to return relatively small result sets from the database, and have all the processing occur in-database, which means only the results are returned and no files are transferred across a network to compute those results.  

 

You will really see the performance difference with big data.  But to analyze performance, you must consider all the other relevant factors that affect performance.  Oracle and SAS can run in a grid or MPP, and Oracle queries can be designed so they run completely in-database, which is generally going to be faster than any number of single processors on one computer.  When you consider all of the factors, then you can make better decisions about performance issues and trade offs.

 

If you supply some more details about your environment and your budget, then performance recommendations are easier to recommend.  Even without a budget or an MPP, you can make tweaks to improve elapsed time by a factor of 2 to 5 times.  How much improvement do you need, and how would you measure it?

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 2032 views
  • 6 likes
  • 5 in conversation