01-27-2018 02:45 AM
I am currently trying to merge 2 large data sets using SQL. I haven't had to merge data of this size before, and I need to find an efficient way to merge the data.
The first data set, home._2004, contains 45 variables and 67 million observations.
The second data set, home.c2final, contains 73 variables and 24 million observations.
Here are some example observations and variables to show that the data sets have some, but not all variables in common.
Year State MSA
2004 NE 00561
2004 AL 15646
2004 AL 45646
Year State Loan_Amount
1999 NE 55000
2004 KS 165000
2005 AK 200000
I have been using:
DATA home._2004; SET home._2004data (obs=10); RUN; DATA home.c2final; SET home.c2 (obs=50); RUN; PROC SQL; CREATE TABLE home.merge as SELECT * FROM home._2004 OUTER UNION ALL SELECT * FROM home.c2final; QUIT;
This works when I am only using a few observations from each, but they are too large to do all together. When I used 10 observations from home._2004 and all observations from home.c2final it takes 10 minutes. I am trying to find a way to make this more efficient. My thought was to find a way to merge a number of observations from home._2004 with all of home.c2final, and have the program do this with the next group of observations until all of the data is merged.
For example if I used 50 observations from home._2004:
Observations 1-50 merge with c2final
Observations 51-100 merge with c2final
I am new to SAS, and I am not sure if this is even a possibility.
01-27-2018 03:08 AM
Step 1: know your data (also Maxim 3).
Determine their relationship with regard to the join keys (one-to-one, one-to-many, many-to-many).
See if other variables appear in both datasets, and determine what to do with them.
From that information, we can help in finding the best approach.
01-27-2018 10:49 PM
OUTER UNION is for concatenating datasets, not merging And ALL is not allowed with OUTER UNION in SAS/SQL. So I suspect that it is not the operation you really want.