Sergio,
Jwillis has used an installation with an oracle client (schema=). That one is offering more advanced options. The oracle client is a free client from oracle but is requiring a SAS license for SAS/Access to Oracel. You are using the ODBC method obviously using the Windows client (DSN= )
Within a DBMS (explicit pass through) there is a common separation by schema-s. This looking like the libnames in SAS.
You cannot intermix libnames (SAS environment) and schemas (DBMS environment).
The important difference imo is " as select G.*" between the create and from connection
Jaap Karman escribió:
Sergio,
Jwillis has used an installation with an oracle client (schema=). That one is offering more advanced options. The oracle client is a free client from oracle but is requiring a SAS license for SAS/Access to Oracel. You are using the ODBC method obviously using the Windows client (DSN= )Within a DBMS (explicit pass through) there is a common separation by schema-s. This looking like the libnames in SAS.
You cannot intermix libnames (SAS environment) and schemas (DBMS environment).The important difference imo is " as select G.*" between the create and from connection
Sorry Jaap but I don't understand anything, I run the following code and it works perfectly
proc sql;
connect to odbc as aaaaaa(dsn=xxxx USER=xxxxx PW=xxxxxx);
create table test as select * from connection to aaaaaa
(select * from schemaname.tablename);
quit;
After the code run I can see the dataset in my library, what is the difference between this code and the join I try to execute?
Regards
proc sql;
connect to odbc as aaaaaa(dsn=xxxx USER=xxxxx PW=xxxxxx);
create table test as select * from connection to aaaaaa
(select * from schemaname.tablename);
quit;
Well, another option, I am downloading the table order by thee key variable using pass thourh. After this I'll downloading the second table, ordering too.
This way I suppose I dont need the proc sort and I can perfom a merge to find wich rows are in common.
What do you think about this?
Thanks
If you can get the sort done "on the fly", that is probably the simplest solution. Then the merge just reads sequentially through the tables and writes the target,so it all depends on the sequential I/O throughput of your SAS server.
If you have the space available on your local machine and that one is fast enough.... you are on Kurt's track. Nothing to bother on unexpected data transports.
Should work fine when those conditions (local performance) are met.
Hi Sergio,
This is typically what I do when it comes to debugging and resolving Performance issues with Queries that involves RDBMS source tables in SAS.
1. Always test the SQL statement(s) within a Native RDBMS Query Tool. This allows you to validate and Profile the query, which in turn, gives you better insight into the execution path and the true cost of the query (Performance Impact).
In your case, using ORACLE you are spoiled for choices
1.1 Oracle SQL Developer (Free)
1.2 Toad for Oracle (Commercial)
1.3 .... (Other tools that I'm not familiar with)
2. Use SQL Pass-Through. This allows you to use RDBM's specific syntax that may not be surfaced by the SAS Libname statement options.
3. Change the default BUFFSIZE= option value.
"BUFFSIZE=
specifies the number of rows to retrieve from an Oracle table or view with each fetch. Using this argument can improve the performance of any query to Oracle.
By setting the value of the BUFFSIZE= argument in your SAS programs, you can find the optimal number of rows for a given query on a given table. The default buffer size is 250 rows per fetch. The value of BUFFSIZE= can be up to 2,147,483,647 rows per fetch, although a practical limit for most applications is less, depending on the available memory."
I hope this helps,
Ahmed
Sergio,
You can try a hash join, it runs a lot quicker. If you want I can post an example.
-raisa
Thanks all for you tips, I'll keep in mind
rachel escribió:
Sergio,
You can try a hash join, it runs a lot quicker. If you want I can post an example.
-raisa
Sure raisa, please go on
Regards
Sergio,
I created two fake data sets.
One with 50mil rows, the other with 30 mil rows.
Then I merge then.
Remember in a hash join for large data place the smalled data on top.
Here is the code;
******************************************
rachel escribió:
Sergio,
I created two fake data sets.
One with 50mil rows, the other with 30 mil rows.
Then I merge then.
Remember in a hash join for large data place the smalled data on top.
Here is the code;
******************************************
Thanks a lot rachel
Hi all
I get an error when running the hash code
ERROR: Hash object added 25165808 items when memory failure occurred.
FATAL: Insufficient memory to execute DATA step program. Aborted during the EXECUTION phase.
ERROR: The SAS System stopped processing this step because of insufficient memory.
My PC info:
CPU :
Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz
Available Memory : 3242 MB
----------------------------------------------------
Drive List :
Drive C:
Type: Fixed
Total available space: 79 GB
Total size of drive: 232 GB
Thanks
Sergio, Those specifications are of a low-end computer not very advanced these days. The speed is all right but that one does not increase anymore since 2005 or so.
I will not tease you with Server specs. The 3Gb is that little you must be very carefull with tools needing that. You have experienced this now with the hashing.
With the available disk space you should be able to process some files - merging - of a size as long it will fit with a 3-fold margin.
I suspect that at least part of your performance problem is that you're using ODBC connectivity. My experience is that that is s-l-o-w for anything more than a couple hundred thousand rows.
connect to odbc as oracle
Ask around and be sure you don't have SAS Access/Oracle. That way you can control which of the datasets is moved to the other environment depending on which is the smaller dataset.
It could be that excessive time is spent creating the SAS data set A. Check the log and if you think so, when you create data set A add a Set keep= or drop= option to the where= to insure that only the fields you need are loaded into A... loading unneeded data is a common mistake that can have a huge performance impact.
data a;
set b (where = (var1<= date and var2>date and var3>date);
I've written a couple papers on using SAS and Oracle. The first one deals with SAS Options that either improve or degrade performance and I think you'll find a couple things here that will be useful to you, like how to make the read buffer larger. Each section has a quick tabular summary so that you can find the option(s) that look appropriate for your situation.
http://support.sas.com/resources/papers/proceedings13/072-2013.pdf
Another paper is not so appropriate to this problem but explains how to leverage database and system return codes. If you eventually need to automate your processes then these are critical for trapping and handling errors. http://analytics.ncsu.edu/sesug/2012/BB-03.pdf
Good luck!
John
You've got already a lot of valuable advice and I'm not adding anything new but will just highlight what works best for me.
I have to work from time to time with Oracle tables and volumes as you describe it. With such volumes what you want to do is to reduce the volumes on the database side before you transfer it to SAS - and as a consequence you normally want to push processing to the data base.
What normally works best for me is using explicit pass-through SQL - and to develop and test the pass-through bit "outside" of SAS. I normally use Oracle SQL Developer for this (as you even don't need to install it and can run it directly from a flash drive).
Using Oracle SQL Developer allows you to tweak your code faster - it shows you the costs of queries and also graphically shows you the execution path (using "explain").
Once you've got your query working use it within the pass-through section of SAS Proc SQL.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.