BookmarkSubscribeRSS Feed
Bennettr99
Fluorite | Level 6

I am currently trying to merge 2 large data sets using SQL. I haven't had to merge data of this size before, and I need to find an efficient way to merge the data.

 

The first data set, home._2004, contains 45 variables and 67 million observations.

The second data set, home.c2final, contains 73 variables and 24 million observations.

 

Here are some example observations and variables to show that the data sets have some, but not all variables in common.

 

Sample Data 

home._2004

Year      State          MSA    

2004      NE             00561

2004      AL              15646

2004      AL              45646

 

home.c2final

Year      State           Loan_Amount

1999     NE                55000

2004     KS               165000

2005     AK                200000

 

I have been using:

DATA home._2004;
SET home._2004data (obs=10);
RUN;

DATA home.c2final;
SET home.c2 (obs=50);
RUN;

PROC SQL;
      CREATE TABLE home.merge as
      SELECT *
           FROM home._2004
       OUTER UNION ALL
       SELECT *
           FROM home.c2final;
QUIT;

This works when I am only using a few observations from each, but they are too large to do all together. When I used 10 observations from home._2004 and all observations from home.c2final it takes 10 minutes. I am trying to find a way to make this more efficient. My thought was to find a way to merge a number of observations from home._2004 with all of home.c2final, and have the program do this with the next group of observations until all of the data is merged. 

 

For example if I used 50 observations from home._2004:

Observations 1-50 merge with c2final

Followed by

Observations 51-100 merge with c2final

etc.

 

I am new to SAS, and I am not sure if this is even a possibility. 

2 REPLIES 2
Kurt_Bremser
Super User

Step 1: know your data (also Maxim 3).

Determine their relationship with regard to the join keys (one-to-one, one-to-many, many-to-many).

See if other variables appear in both datasets, and determine what to do with them.

 

From that information, we can help in finding the best approach.

PGStats
Opal | Level 21

OUTER UNION is for concatenating datasets, not merging And ALL is not allowed with OUTER UNION in SAS/SQL. So I suspect that it is not the operation you really want.

PG

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 725 views
  • 0 likes
  • 3 in conversation