Hi,
I have two tables, each with about 4-5 million rows of data.
Table A has about 50 columns and table B 2500.
The key is a combination of 4 columns.
Using SQL takes about a half hour.
Any suggestions on how to speed things up would be greatly appreciated.
Thanks,
Eli
Can you post your code please?
If you have a one-to-many or one-to-one relationship, consider sorting and a data step merge.
2500 columns lets me suspect a design issue; depending on your needs, a longitudinal dataset structure might be better.
Depending on your observation size, you might simply run into the limits of your storage infrastructure.
Please supply more details (SQL step code, observation size of your datasets, especially of that with just 50 variables).
There are things you can do, but half an hour is not that long time for a join like that.
If your data is in a DBMS, take a look at the READBUFF libname for dataset option (assuming that you extract all the rows or most of them). If the output data also goes to a DBMS, the INSERTBUFF option can also help. If reading from and writing to SAS, the READBUFF and BUFFNO options may be of value.
If your data is sorted by the keys, try using a data step instead of SQL.
If you are only accessing a minor portion of the tables, indexes may help.
Ho @eliber
Try the technique found in this paper Hash + Point = Key. it will give you options for your large tables
Hope this helps,
Ahmed
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.