BookmarkSubscribeRSS Feed
jakarman
Barite | Level 11

Sergio,
Jwillis has used an installation with an oracle client (schema=). That one is offering more advanced options. The oracle client is a free client from oracle but is requiring a SAS license for SAS/Access to Oracel. You are using the ODBC method obviously using the Windows client (DSN= )

Within a DBMS (explicit pass through) there is a common separation by schema-s. This looking like the libnames in SAS.
You cannot intermix libnames (SAS environment) and schemas (DBMS environment).

The important difference imo is " as  select G.*"   between the create and from connection 

---->-- ja karman --<-----
SergioSanchez
Calcite | Level 5

Jaap Karman escribió:

Sergio,
Jwillis has used an installation with an oracle client (schema=). That one is offering more advanced options. The oracle client is a free client from oracle but is requiring a SAS license for SAS/Access to Oracel. You are using the ODBC method obviously using the Windows client (DSN= )

Within a DBMS (explicit pass through) there is a common separation by schema-s. This looking like the libnames in SAS.
You cannot intermix libnames (SAS environment) and schemas (DBMS environment).

The important difference imo is " as  select G.*"   between the create and from connection

Sorry Jaap but I don't understand anything, I run the following code and it works perfectly

proc sql;

connect to odbc as aaaaaa(dsn=xxxx USER=xxxxx PW=xxxxxx);

create table test as select * from connection to aaaaaa

(select * from schemaname.tablename);

quit;

After the code run I can see the dataset in my library, what is the difference between this code and the join I try to execute?

Regards

jakarman
Barite | Level 11

proc sql;

connect to odbc as aaaaaa(dsn=xxxx USER=xxxxx PW=xxxxxx);

create table test as select * from connection to aaaaaa

(select * from schemaname.tablename);

quit;

---->-- ja karman --<-----
SergioSanchez
Calcite | Level 5

Well, another option, I am downloading the table order by thee key  variable using pass thourh. After this I'll downloading the second table, ordering too.

This way I suppose I dont need the proc sort and I can perfom a merge to find wich rows are in common.

What do you think about this?

Thanks

Kurt_Bremser
Super User

If you can get the sort done "on the fly", that is probably the simplest solution. Then the merge just reads sequentially through the tables and writes the target,so it all depends on the sequential I/O throughput of your SAS server.

jakarman
Barite | Level 11

If you have the space available on your local machine and that one is fast enough.... you are on Kurt's track. Nothing to bother on unexpected data transports.

Should work fine when those conditions (local performance) are met.

---->-- ja karman --<-----
AhmedAl_Attar
Rhodochrosite | Level 12

Hi Sergio,

This is typically what I do when it comes to debugging and resolving Performance issues with Queries that involves RDBMS source tables in SAS.

1. Always test the SQL statement(s) within a Native RDBMS Query Tool. This allows you to validate and Profile the query, which in turn, gives you better insight into the execution path and the true cost of the query (Performance Impact).

In your case, using ORACLE you are spoiled for choices

   1.1 Oracle SQL Developer (Free)

   1.2 Toad for Oracle (Commercial)

   1.3 .... (Other tools that I'm not familiar with)

2. Use SQL Pass-Through. This allows you to use RDBM's specific syntax that may not be surfaced by the SAS Libname statement options.

3. Change the default BUFFSIZE= option value.

          "BUFFSIZE=

specifies the number of rows to retrieve from an Oracle table or view with each fetch. Using this argument can improve the performance of any query to Oracle.

By setting the value of the BUFFSIZE= argument in your SAS programs, you can find the optimal number of rows for a given query on a given table. The default buffer size is 250 rows per fetch. The value of BUFFSIZE= can be up to 2,147,483,647 rows per fetch, although a practical limit for most applications is less, depending on the available memory."

4. Use the SASTRACE= System Option to determine whether an operation is processed by SAS or is passed t...

I hope this helps,

Ahmed

rachel
Calcite | Level 5

Sergio,

You can try a hash join, it runs a lot quicker. If you want I can post an example.

-raisa


SergioSanchez
Calcite | Level 5

Thanks all  for you tips, I'll keep in mind

rachel escribió:

Sergio,

You can try a hash join, it runs a lot quicker. If you want I can post an example.

-raisa

Sure  raisa, please go on

Regards

rachel
Calcite | Level 5

Sergio,

I created  two fake data sets.

One with 50mil rows, the other with 30 mil rows.

Then I merge then.

Remember in a hash join for large data place the smalled data on top.

Here is the code;

******************************************

SergioSanchez
Calcite | Level 5

rachel escribió:

Sergio,

I created  two fake data sets.

One with 50mil rows, the other with 30 mil rows.

Then I merge then.

Remember in a hash join for large data place the smalled data on top.

Here is the code;

******************************************

Thanks a lot rachel

SergioSanchez
Calcite | Level 5

Hi all

I get an error when running the hash code

 

ERROR: Hash object added 25165808 items when memory failure occurred.

FATAL: Insufficient memory to execute DATA step program. Aborted during the EXECUTION phase.

ERROR: The SAS System stopped processing this step because of insufficient memory.

My PC info:

CPU :

   Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz

Available Memory : 3242 MB

----------------------------------------------------

Drive List :

Drive C:

Type: Fixed

Total available space:         79 GB

Total size of drive:            232 GB

Thanks

jakarman
Barite | Level 11

Sergio, Those specifications are of a low-end computer not very advanced these days. The speed is all right but that one does not increase anymore since 2005 or so.
I will not tease you with Server specs. The 3Gb is that little you must be very carefull with tools needing that. You have experienced this now with the hashing.

With the available disk space you should be able to process some files - merging - of a size as long it will fit with a 3-fold margin.   

---->-- ja karman --<-----
bentleyj1
Quartz | Level 8

I suspect that at least part of your performance problem is that you're using ODBC connectivity.  My experience is that that is s-l-o-w for anything more than a couple hundred thousand rows.

      connect to odbc as oracle

Ask around and be sure you don't have SAS Access/Oracle.  That way you can control which of the datasets is moved to the other environment depending on which is the smaller dataset.

It could be that excessive time is spent creating the SAS data set A.  Check the log and if you think so, when you create data set A add a Set keep= or drop= option to the where= to insure that only the fields you need are loaded into A... loading unneeded data is a common mistake that can have a huge performance impact.

data a;

set b (where = (var1<= date and var2>date and var3>date);

I've written a couple papers on using SAS and Oracle.  The first one deals with SAS Options that either improve or degrade performance and I think you'll find a couple things here that will be useful to you, like how to make the read buffer larger.  Each section has a quick tabular summary so that you can find the option(s) that look appropriate for your situation.

http://support.sas.com/resources/papers/proceedings13/072-2013.pdf

Another paper is not so appropriate to this problem but explains how to leverage database and system return codes.  If you eventually need to automate your processes then these are critical for trapping and handling errors.  http://analytics.ncsu.edu/sesug/2012/BB-03.pdf

Good luck!

John

Patrick
Opal | Level 21

You've got already a lot of valuable advice and I'm not adding anything new but will just highlight what works best for me.

I have to work from time to time with Oracle tables and volumes as you describe it. With such volumes what you want to do is to reduce the volumes on the database side before you transfer it to SAS - and as a consequence you normally want to push processing to the data base.

What normally works best for me is using explicit pass-through SQL - and to develop and test the pass-through bit "outside" of SAS. I normally use Oracle SQL Developer for this (as you even don't need to install it and can run it directly from a flash drive).

Using Oracle SQL Developer allows you to tweak your code faster - it shows you the costs of queries and also graphically shows you the execution path (using "explain").

Once you've got your query working use it within the pass-through section of SAS Proc SQL.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats