Yes Linus your are correct the biggest problem for Sergio is getting the support of it staff as they are now blocking him.
Needing someone to get this obstructive approach being solved
@rachel, your example may be excellent. As Sergio laptop is very limited on resources it will not help him.
It could be this would be run at the sas server side. In all cases he is needing good technical support on that side.
When that is missing we can give a lot of suggestions without the chance being usefull.
Im reposting because im not sure you guys saw my latest reply.
Ok...but if Sergio changes his mind about the pass through query, here is what I suggest given my limited experience with it.
I never merged over 10 million records in a pass through, however I merged upto 15 tables from a relational db.
1. I suggest for the first iteration to avoid macros, they can cause errors in a pass through, at least test the macro on a small data set to make sure it works.
2. If your date variable is of a datetime type then in your pass through query specify the date like this, just give it your date var name, for the example i used star and end date.
WHERE "t1"start_date >= TO_DATE('01-JAN-2013 00:00:00','DD-Mon-YYYY HH24:MI:SS') AND
"t1".end_date < TO_DATE('31-MAY-2014 00:00:00','DD-Mon-YYYY HH24:MI:SS') AND
The pass through query will run considerably faster than if you pull the data into sas and try to manipulate it. So if your 50 mill join ran for five hours in sas, then the pass through may take half the time...which is still long. You can also run a scheduled query at night, this way you'll have the data in the morning.
You can at least try running the pass through at night, and look at the log in the morning to see how long it ran.
But make sure you test your code on a limited number of obs, before setting it to run on the full data.
Hi all
Well, I have done some test and here are the results
1.Copy a dataset from oracle db to sas folder with a data-set statement
278 data a;
279 set b (keep = var1....var11);
280 IF var1 <= 20140530 and var2> 20140530 and var3>=20140530; run;
NOTE: There were 56443158 observations read from the data set b.
NOTE: The data set a has 53779976 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 35:12.97
user cpu time 7:00.68
system cpu time 2:03.58
Memory 244k
OS Memory 9072k
Timestamp 11/06/2014 9:43:12
1.Copy a dataset from oracle db to sas folder with a the pass-thru method
292 proc sql;
293 connect to odbc as Oracle (USER=zzzzzzz PW=XXXXXXXX DSN='wwwww');
294 create table a as select * from connection to Oracle
295 (
296 select var1,.....,var11
298 from b
299 where var1 <= 20140530 and var2> 20140530 and var3>=20140530;
300 quit;
NOTE: Table a created, with 53779976 rows and 11 columns.
NOTE: PROCEDURE SQL used (Total process time):
real time 36:14.66
user cpu time 7:24.60
system cpu time 2:05.03
Memory 238k
OS Memory 10864k
3 Copy a dataset from oracle db to sas folder with proc sql
312 proc sql;
313 create table a as
314 select var1,...var11
316 from b
317 where var1 <= 20140530 and var2> 20140530 and var3>=20140530;
318 quit;
NOTE: Table a created, with 53779976 rows and 11 columns.
NOTE: PROCEDURE SQL used (Total process time):
real time 35:22.28
user cpu time 6:49.44
system cpu time 1:57.15
Memory 214k
OS Memory 10352k
As you can see the differences in time are minimal.
On the other hand I have also tested the to sort with proc sort and with a hash object
1 proc sort
60 proc sort data = a out= b;
61 by var1;
62 run;
NOTE: There were 7496967 observations read from the data set a.
NOTE: The data set b has 7496967 observations and 44 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 4:27.56
cpu time 30.10 seconds
2. Hash object
73 data _null_;
74 if _n_=1 then do;
75 declare hash test (dataset:"sastemp.ec_dim_sub_con");
76 test.definekey ('a');
77 test.definedata ('a');
78 test.definedone();
79 call missing (a);
80 end;
81 rc = test.output (dataset:'sastemp.dummy');
82 run;
NOTE: There were 7496967 observations read from the data set a.
NOTE: The data set b has 4676257 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 1:41.72
cpu time 10.56 seconds
It looks very promising but when I put all the variables that
I need to perform the join with the hash....
ERROR: Hash object added 2359280 items when memory failure occurred.
FATAL: Insufficient memory to execute DATA step program. Aborted during the EXECUTION phase.
ERROR: The SAS System stopped processing this step because of insufficient memory.
And there are half the variables that I run with the proc sort.
Finally I have to test the rachel's codes but Today the ftp
server doens't work very well and I haven't download the files I need to and also because I think that I don't understand the code very well. I believe that is a join wich split in two and after union in one dataset, but I not very sure.
By the way, what is the purpose of the call missing in the sort hash?. It's not clear for me.
It is frustrating to have so much good advice and not be able to put into practice because my pc is a low end computer and I can't have access to create tables in the Oracle server too.
I don't see a good alternative to reduce the time the join consume without take my home pc to the work
best regards
Sergio,
First, I would recommend you to check out the amount of Memory allocated to the SAS session running on your PC.
- Open SAS Session, and submit the following statement
Proc oprions group=memory; run;
Second, Once you checked the output of the above in the log, you can increase the allocated Memory amount by specifying the following options in the SAS Command (Probably within the Windows Short-cut)
-memsize xxM -sortSize xxM
Note: -sortSize should not exceed 75% of the -memSize amount.
Third, As I mentioned this before in earlier reply (June 5), Change the value of the
- buffsize= option within your SQL Pass-thru connection clause.
- readbuff= option within your ORACLE Libname statement
Forth, Re-Run your test and compare results.
Hope this helps,
Ahmed
Sergio,
Delete all data from your permanent work directory. Then download the datasets you want to merge using a sas to oracle pass through query into your permanent work directory that you just cleaned.
In your hash code omit the call missing statement, remember to place the smaller data set on top, and the larger on the bottom. Try the code I provided for you below.
data _null_;
if _n_=1 then do;
declare hash test (dataset:"sastemp.ec_dim_sub_con");
test.definekey ('a');
test.definedata (all:'Y'');
test.definedone();
end;
set sastemp.dummy
rc = test.output (key:a);
if rc=0 then output;
drop rc;
run;
82 run;
Hi rachel
Thank you so much for your help.
Regards
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.