SAS large data sets, SAS fact table, DI Studio, ETL Oracle vs. SAS best-practive

Accepted Solution Solved
Reply
Contributor
Posts: 71
Accepted Solution

SAS large data sets, SAS fact table, DI Studio, ETL Oracle vs. SAS best-practive

Dear SAS-Experts,

My organization has been using SAS Enterprise Guide for a few years now and we finally got our SAS Server up and running. SAS DI Studio, Web Report Studio, etc.
I don’t have the server specs ready but what the IT guys tell me, it is good.
The big question for me as a data manager is if we can now trash our Oracle DWH with annually 15m records and just move the data to SAS. How is SAS performing with large data set e.g. 150m rows over 10 years’ worth of data. The 15m rows annually are basically created by appending two tables from the Oracle transactional system into a large fact table. Where should we do this data extraction and merging, in the Oracle environment (PL/SQL) or using SAS DI Studio and PROC SQL/SAS BASE. Which is performing better? I know I can’t offer enough technical details to solicit a 100% water tight recommendation but could you maybe point out what is best-practice?

Bye

Philip


Accepted Solutions
Solution
‎05-09-2016 11:40 AM
Esteemed Advisor
Posts: 5,194

Re: SAS large data sets, SAS fact table, DI Studio, ETL Oracle vs. SAS best-practive

Don't be sceptical about DAD capabilities when it comes to cubes marts and other types analytical and reporting storage. It's great.
When it comes to the question about moving the whole DW stack to SAS we just know too little about your current Oracle deployment anD the requirements that you have. But SAS is a very capable sw and I think in the end it depends on the competence of your SAS architect and ETL developers.
Just as a hint I have colleges that are running a 13TB telco DW w 100% SAS, and they're doing gr8.
Data never sleeps

View solution in original post


All Replies
Valued Guide
Posts: 2,111

Re: SAS large data sets, SAS fact table, DI Studio, ETL Oracle vs. SAS best-practive

Philip,

 

I would do the join in Oracle.  I would start by creating it as a view that is limited to only the columns that are needed by SAS.  That will minimize the disk storage requirements and the amount of data to be transferred across the network from the Oracle server to the SAS server (they are often on separate hardware platforms).  You can treat the Oracle view as a SAS dataset and not copy the file as a first step.

Contributor
Posts: 71

Re: SAS large data sets, SAS fact table, DI Studio, ETL Oracle vs. SAS best-practive

[ Edited ]

We just tried it with 50m records via proc sql using the Oracle-DB but Oracle dropped out: ORA-12801 P031 :/

 

Do you have an idea how to only load the DELTA from the Oracle Transaction System in to the SAS data set? Oracle has Materialized Views to only add the actual changes made. How about with SAS DI? I don’t want to grab the whole lot of data every day and it would be great if the delete/update/insert is done automatically. I read about "Change Data Capture" which would require that we introduce a kind of change-log table. Something which is not yet done using Oracle Materialized Views.

 

Esteemed Advisor
Posts: 5,194

Re: SAS large data sets, SAS fact table, DI Studio, ETL Oracle vs. SAS best-practive

As for the join/merge question - this will be easy to test once your environment is ready.

150m rows sounds feasible at least from loading point of view. Query performance is harder to predict.
If you need to optimize star schema queries you might want to consider SAS SPD Server which also have nice tools for logical partitioning, e.g. by year.
Data never sleeps
Respected Advisor
Posts: 3,060

Re: SAS large data sets, SAS fact table, DI Studio, ETL Oracle vs. SAS best-practive

What is the business case for moving your Oracle DW to SAS? I suggest the justification would have to be more than a performance gain, if any.

Contributor
Posts: 71

Re: SAS large data sets, SAS fact table, DI Studio, ETL Oracle vs. SAS best-practive

Hi Doc_Duke,
Thanks for the feedback.

@SASKiwi, we got our transactional system based on an Oracle DB Standard Edition.
I don't know much about the DB server hardware. I think some in our organization hope to reduce the burden on the Oracle DB. The business case is that everyone is excited about SAS and we think that we can dismantle our existing Oracle DWH and instead have it all in SAS. I have an Oracle background so I am skeptical if SAS can handle building OLAP cubes or DWH-subsets i.e. Data Marts for marketing, HR, management if we have a 150m observation SAS table that is joined/combined with, let’s say, 10 dimension tables.
As far as I understood, Doc_Duke suggested doing the joins in Oracle and exposing ready-made data mart views to SAS.
Respected Advisor
Posts: 3,060

Re: SAS large data sets, SAS fact table, DI Studio, ETL Oracle vs. SAS best-practive

In that case I suggest you start small by trialling one or two data marts in SAS to get a feel for how it performs with your reporting. A lot of SAS sites are now bypassing OLAP and cubes and are using the new in-memory SAS VA technology. We are about to do this ourselves. It appears to be a lot easier than having to pre-guess dimensions and facts with cubes.

Solution
‎05-09-2016 11:40 AM
Esteemed Advisor
Posts: 5,194

Re: SAS large data sets, SAS fact table, DI Studio, ETL Oracle vs. SAS best-practive

Don't be sceptical about DAD capabilities when it comes to cubes marts and other types analytical and reporting storage. It's great.
When it comes to the question about moving the whole DW stack to SAS we just know too little about your current Oracle deployment anD the requirements that you have. But SAS is a very capable sw and I think in the end it depends on the competence of your SAS architect and ETL developers.
Just as a hint I have colleges that are running a 13TB telco DW w 100% SAS, and they're doing gr8.
Data never sleeps
Contributor
Posts: 71

Re: SAS large data sets, SAS fact table, DI Studio, ETL Oracle vs. SAS best-practive

Hi LinusH,
do you have an idea how to only load the DELTA from the Oracle Transaction System in to the SAS data set? Oracle has Materialized Views to only add the actual changes made. How about with SAS DI? I don’t want to grab the whole lot of data every day and it would be great if the delete/update/insert is done automatically.
Esteemed Advisor
Posts: 5,194

Re: SAS large data sets, SAS fact table, DI Studio, ETL Oracle vs. SAS best-practive

SAS capabilities it should read. Darn autocorrect...😬
Data never sleeps
Esteemed Advisor
Posts: 5,194

Re: SAS large data sets, SAS fact table, DI Studio, ETL Oracle vs. SAS best-practive

I think that there is another active thread, perhaps yours? And I think the answer there was accurate. Bottom line is: you need a way to extract changed records from Oracle. That done, you have several options in DIS depending on your requirements and your physical data model.
Data never sleeps
Contributor
Posts: 71

Re: SAS large data sets, SAS fact table, DI Studio, ETL Oracle vs. SAS best-practive

[ Edited ]

Hi,

you are right. The delta problem is addressed somewhere else..

Bye

 

Contributor
Posts: 22

Re: SAS large data sets, SAS fact table, DI Studio, ETL Oracle vs. SAS best-practive

Any answer to performance questions has to consider the architecture of the SAS host, the architecture of the database, and the speed of the network connections.  Also you need to look at the type of data, and how much of that data has variable length character fields.

 

Systems and databases can be grid or MPP, which means that they have many processors running in a massively parallel manner.  These systems increase performance by 10 to 100 or more times, compared with single processing systems.  Also the design of the database affects performance, so it's difficult to compare one database in third normal form and to another may non-normal database.  But again, it depends on the queries that run on those databases, since most all databases have an issue where a few bad queries can slow down everyone.

 

Character data is a potential problem since SAS does not have true variable length data (maybe DS2 will make a liar out of me), but databases do have efficient storage for character data.  Also character keys are slower and less efficient than numeric keys, which makes a difference when you are deciding to store BIGINT data as characters in a SAS dataset.

 

Getting down to performance in general, a SAS grid or an MPP is always better if you can afford it.  Whether you have a grid or not, you still need to be concerned about data movement over any network, which is why some programs may take a day or so to finish. Your best bet for performance is to return relatively small result sets from the database, and have all the processing occur in-database, which means only the results are returned and no files are transferred across a network to compute those results.  

 

You will really see the performance difference with big data.  But to analyze performance, you must consider all the other relevant factors that affect performance.  Oracle and SAS can run in a grid or MPP, and Oracle queries can be designed so they run completely in-database, which is generally going to be faster than any number of single processors on one computer.  When you consider all of the factors, then you can make better decisions about performance issues and trade offs.

 

If you supply some more details about your environment and your budget, then performance recommendations are easier to recommend.  Even without a budget or an MPP, you can make tweaks to improve elapsed time by a factor of 2 to 5 times.  How much improvement do you need, and how would you measure it?

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 623 views
  • 6 likes
  • 5 in conversation