06-30-2013 12:03 AM
I'm running the following code on two datasets that is very long, but not fat. Both contains over 1 million observations. The SAS code take forever. And it is not reporting any errors. May I ask what went wrong with the following code? The series variables are numeric value approaching 1 billion. They are used is key for merging dataset. Change the merging method to inner join does not improve also. May I ask who to deal with the problem?
create table WANT as
from HAVE1 as a left join HAVE2 as b
on b.series_beg <= a.series <= b.series_end;
06-30-2013 05:36 AM
1. I would run a small amount of obs by using inobs= or outobs= option to check that your logic is correct. Do you really need to have all variables? try limiting the amount of variables.
2. Check using proc setinit what SAS engines that you may have... the use of SAS/Access Engines will improve the performance (in-database processing) if you are using a database.
3. Try options Bufsize and Bufno as this can help.
4. Hash table are good as long as you have enough resources (i.e. memory).
06-30-2013 04:45 PM
In addition to these tips, you might want to consider these options:
07-01-2013 01:14 AM
Thank you, guys. I think it is because the product is just too big. I subset my data to 1000 and it completed within 5 minutes. I'm using school's server. I don't think I can set the system option such as bufsize and bufno, can I ?
The RAM is pretty big. How can I make use of that on a Linux server? Again, it seems to be that memsize and sortsize are both system options that I cannot change as a users. But is there a way to include such in my code?
Need further help from the community? Please ask a new question.