I am currently trying to merge 2 large data sets using SQL. I haven't had to merge data of this size before, and I need to find an efficient way to merge the data.
The first data set, home._2004, contains 45 variables and 67 million observations.
The second data set, home.c2final, contains 73 variables and 24 million observations.
Here are some example observations and variables to show that the data sets have some, but not all variables in common.
Sample Data
home._2004
Year State MSA
2004 NE 00561
2004 AL 15646
2004 AL 45646
home.c2final
Year State Loan_Amount
1999 NE 55000
2004 KS 165000
2005 AK 200000
I have been using:
DATA home._2004;
SET home._2004data (obs=10);
RUN;
DATA home.c2final;
SET home.c2 (obs=50);
RUN;
PROC SQL;
CREATE TABLE home.merge as
SELECT *
FROM home._2004
OUTER UNION ALL
SELECT *
FROM home.c2final;
QUIT;
This works when I am only using a few observations from each, but they are too large to do all together. When I used 10 observations from home._2004 and all observations from home.c2final it takes 10 minutes. I am trying to find a way to make this more efficient. My thought was to find a way to merge a number of observations from home._2004 with all of home.c2final, and have the program do this with the next group of observations until all of the data is merged.
For example if I used 50 observations from home._2004:
Observations 1-50 merge with c2final
Followed by
Observations 51-100 merge with c2final
etc.
I am new to SAS, and I am not sure if this is even a possibility.
Step 1: know your data (also Maxim 3).
Determine their relationship with regard to the join keys (one-to-one, one-to-many, many-to-many).
See if other variables appear in both datasets, and determine what to do with them.
From that information, we can help in finding the best approach.
OUTER UNION is for concatenating datasets, not merging And ALL is not allowed with OUTER UNION in SAS/SQL. So I suspect that it is not the operation you really want.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.