BookmarkSubscribeRSS Feed
babu-in
Calcite | Level 5

I have several large datasets(30GB), and try to merge only few variables from second table to first table based on key variable. But since they are too big, it will take a long time. which method is most efficient to do this.

4 REPLIES 4
Kurt_Bremser
Super User

Are both tables of this size, or do you have one large table and a considerably smaller one used for a lookup?

Do you have a many-to-one or a many-to-many relationship?

 

In any case, you need to strive to "travel light", meaning you only keep the variables you need when sorting in preparation for a merge.

babu-in
Calcite | Level 5

yes, tables are same size and keys are unique in the both tables.

Kurt_Bremser
Super User

Then take a closer look at your data. See if the datasets are compressed, and see if you need all columns in your output or if you can drop some.

If in doubt, run proc contents on your datasets and post the output.

 

Also, please post the log of your merge code, using the {i} button (available when posting Rich Text).

Patrick
Opal | Level 21

@babu-in 

On this level the best answer I can give is "it depends".

Assuming both data sets are SAS tables what normally takes the most time is sorting and writing data to disk. That's either a Proc Sort with a data step merge or then a SQL Join. But also the SQL join will sort and write data to disk as intermediary temporary files in UTILLOC.

 

"and try to merge only few variables from second table"

IF you can fit these few variables into memory then most likely a data step hash lookup would perform best as this doesn't require any sort operations and reduces write operations to disk.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 3023 views
  • 0 likes
  • 3 in conversation