BookmarkSubscribeRSS Feed
jakarman
Barite | Level 11

Yes Linus your are correct the biggest problem for Sergio is getting the support of it staff as they are now blocking him.

Needing someone to get this obstructive approach being solved

---->-- ja karman --<-----
jakarman
Barite | Level 11

@rachel, your example may be excellent. As Sergio laptop is very limited on resources it will not help him.

It could be this would be run at the sas server side. In all cases he is needing good technical support on that side.

When that is missing we can give a lot of suggestions without the chance being usefull.

---->-- ja karman --<-----
rachel
Calcite | Level 5

Im reposting because im not sure you guys saw my latest reply.

Ok...but if Sergio changes his mind about the pass through query, here is what I suggest given my limited experience with it.


I never merged over 10 million records in a pass through, however I merged upto 15 tables from a relational db.

1. I suggest for the first iteration to avoid macros, they can cause errors in a pass through, at least test the macro on a small data set to make sure it works.

2. If your date variable is of a datetime type then in your pass through query specify the date like this, just give it your date var name, for the example i used star and end date.

WHERE "t1"start_date >= TO_DATE('01-JAN-2013 00:00:00','DD-Mon-YYYY HH24:MI:SS') AND

    "t1".end_date < TO_DATE('31-MAY-2014 00:00:00','DD-Mon-YYYY HH24:MI:SS') AND

The pass through query will run considerably faster than if you pull the data into sas and try to manipulate it. So if your 50 mill join ran for five hours in sas, then the pass through may take half the time...which is still long. You can also run a scheduled query at night, this way you'll have the data in the morning.

You can at least try running the pass through at night, and look at the log in the morning to see how long it ran.

But make sure you test your code on a limited number of obs, before setting it to run on the full data.

SergioSanchez
Calcite | Level 5

Hi all

Well, I have done some test and here are the results

1.Copy a dataset from oracle db to sas folder with a data-set statement

278  data a;
279  set b (keep = var1....var11);
280  IF var1 <= 20140530  and var2> 20140530 and var3>=20140530; run;

NOTE: There were 56443158 observations read from the data set b.
NOTE: The data set a has 53779976 observations and 11 variables.
NOTE: DATA statement used (Total process time):
      real time           35:12.97
      user cpu time       7:00.68
      system cpu time     2:03.58
      Memory                            244k
      OS Memory                         9072k
      Timestamp            11/06/2014  9:43:12

1.Copy a dataset from oracle db to sas folder with a the pass-thru method

292  proc sql;
293  connect to odbc as Oracle (USER=zzzzzzz PW=XXXXXXXX DSN='wwwww');
294  create table a as select * from connection to Oracle
295  (
296  select var1,.....,var11
298  from b
299  where var1 <= 20140530  and var2> 20140530 and var3>=20140530;
300  quit;

NOTE: Table a created, with 53779976 rows and 11 columns.


NOTE: PROCEDURE SQL used (Total process time):
      real time           36:14.66
      user cpu time       7:24.60
      system cpu time     2:05.03
      Memory                            238k
      OS Memory                         10864k

3 Copy a dataset from oracle db to sas folder with proc sql

312  proc sql;
313  create table a as
314  select var1,...var11
316  from b
317  where var1 <= 20140530  and var2> 20140530 and var3>=20140530;
318  quit;

NOTE: Table a created, with 53779976 rows and 11 columns.

NOTE: PROCEDURE SQL used (Total process time):
      real time           35:22.28
      user cpu time       6:49.44
      system cpu time     1:57.15
      Memory                            214k
      OS Memory                         10352k

As you can see the differences in time are minimal.

On the other hand I have also tested the to sort with proc sort and with a hash object

1 proc sort

 

60 proc sort data = a out= b;

61 by var1;

62 run;

NOTE: There were 7496967 observations read from the data set a.

NOTE: The data set b has 7496967 observations and 44 variables.

NOTE: PROCEDURE SORT used (Total process time):

real time 4:27.56

cpu time 30.10 seconds

2. Hash object

 

73 data _null_;

74 if _n_=1 then do;

75 declare hash test (dataset:"sastemp.ec_dim_sub_con");

76 test.definekey ('a');

77 test.definedata ('a');

78 test.definedone();

79 call missing (a);

80 end;

81 rc = test.output (dataset:'sastemp.dummy');

82 run;

NOTE: There were 7496967 observations read from the data set a.

NOTE: The data set b has 4676257 observations and 1 variables.

NOTE: DATA statement used (Total process time):

real time 1:41.72

cpu time 10.56 seconds

It looks very promising but when I put all the variables that

I need to perform the join with the hash....

ERROR: Hash object added 2359280 items when memory failure occurred.

FATAL: Insufficient memory to execute DATA step program. Aborted during the EXECUTION phase.

ERROR: The SAS System stopped processing this step because of insufficient memory.

And there are half the variables that I run with the proc sort.

Finally I have to test the rachel's codes but Today the ftp

server doens't work very well and I haven't download the files I need to and also because I think that I don't understand the code very well. I believe that is a join wich split in two and after union in one dataset, but I not very sure.

By the way, what is the purpose of the call missing in the sort hash?. It's not clear for me.

It is frustrating to have so much good advice and not be able to put into practice because my pc is a low end computer and I can't have access to create tables in the Oracle server too.

I don't see a good alternative to reduce the time the join consume without take my home pc to the work Smiley HappySmiley Happy

best regards

AhmedAl_Attar
Ammonite | Level 13

Sergio,

First, I would recommend you to check out the amount of Memory allocated to the SAS session running on your PC.

- Open SAS Session, and submit the following statement

     Proc oprions group=memory; run;

Second, Once you checked the output of the above in the log, you can increase the allocated Memory amount by specifying the following options in the SAS Command (Probably within the Windows Short-cut)

-memsize xxM -sortSize xxM

Note: -sortSize should not exceed 75% of the -memSize amount.

Third, As I mentioned this before in earlier reply (June 5), Change the value of the

  - buffsize= option within your SQL Pass-thru connection clause.

  - readbuff= option within your ORACLE Libname statement

Forth, Re-Run your test and compare results.

Hope this helps,

Ahmed

rachel
Calcite | Level 5

Sergio,

Delete all data from your permanent work directory. Then download the datasets you want to merge using a  sas to oracle pass through query into  your permanent work directory that you just cleaned.

In your hash code omit  the call missing statement, remember to place the smaller data set on top, and the larger on the bottom. Try the code I provided for you below.

data _null_;

if _n_=1 then do;

declare hash test (dataset:"sastemp.ec_dim_sub_con");

test.definekey ('a');

test.definedata (all:'Y'');

test.definedone();

end;

set  sastemp.dummy

rc = test.output (key:a);

if rc=0 then output;

drop rc;

run;

82 run;

SergioSanchez
Calcite | Level 5

Hi rachel

Thank you so much for your help.

Regards

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats