Outer Union Merge of 2 Large Data Sets Using SQL (SAS 9.4)

Reply
Occasional Contributor
Posts: 6

Outer Union Merge of 2 Large Data Sets Using SQL (SAS 9.4)

I am currently trying to merge 2 large data sets using SQL. I haven't had to merge data of this size before, and I need to find an efficient way to merge the data.

 

The first data set, home._2004, contains 45 variables and 67 million observations.

The second data set, home.c2final, contains 73 variables and 24 million observations.

 

Here are some example observations and variables to show that the data sets have some, but not all variables in common.

 

Sample Data 

home._2004

Year      State          MSA    

2004      NE             00561

2004      AL              15646

2004      AL              45646

 

home.c2final

Year      State           Loan_Amount

1999     NE                55000

2004     KS               165000

2005     AK                200000

 

I have been using:

DATA home._2004;
SET home._2004data (obs=10);
RUN;

DATA home.c2final;
SET home.c2 (obs=50);
RUN;

PROC SQL;
      CREATE TABLE home.merge as
      SELECT *
           FROM home._2004
       OUTER UNION ALL
       SELECT *
           FROM home.c2final;
QUIT;

This works when I am only using a few observations from each, but they are too large to do all together. When I used 10 observations from home._2004 and all observations from home.c2final it takes 10 minutes. I am trying to find a way to make this more efficient. My thought was to find a way to merge a number of observations from home._2004 with all of home.c2final, and have the program do this with the next group of observations until all of the data is merged. 

 

For example if I used 50 observations from home._2004:

Observations 1-50 merge with c2final

Followed by

Observations 51-100 merge with c2final

etc.

 

I am new to SAS, and I am not sure if this is even a possibility. 

Super User
Posts: 8,590

Re: Outer Union Merge of 2 Large Data Sets Using SQL (SAS 9.4)

Posted in reply to Bennettr99

Step 1: know your data (also Maxim 3).

Determine their relationship with regard to the join keys (one-to-one, one-to-many, many-to-many).

See if other variables appear in both datasets, and determine what to do with them.

 

From that information, we can help in finding the best approach.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Esteemed Advisor
Posts: 5,131

Re: Outer Union Merge of 2 Large Data Sets Using SQL (SAS 9.4)

Posted in reply to Bennettr99

OUTER UNION is for concatenating datasets, not merging And ALL is not allowed with OUTER UNION in SAS/SQL. So I suspect that it is not the operation you really want.

PG
Ask a Question
Discussion stats
  • 2 replies
  • 148 views
  • 0 likes
  • 3 in conversation