BookmarkSubscribeRSS Feed
Allaluiah
Quartz | Level 8

How to join the 2 very large datasets with duplicate id keys in both the datasets

 

For example, I have the following datasets

 

table1 :

 

acct_no    a                  b                  c                  d

a1                4                  7                  2                  8

a3                5                  32                8                  12

a5                42                12                54                65

a1                5                  2                  17                23

 

acct_no is the key with duplicates because it is a transaction dataset.

 

table 2:

acct_no    card_no

a4                54

a3                31

a4                43

a1                24

a8                12

a7                23

a8                45

 

acct_no in table 2 also has duplicates because the logic is that a single customer can have more than one card, one a primary card and the other being secondary card

 

Can anybody help me with the logic for the join?

 

The datasets contain 400 million records and are pretty wide and long, however i need to fetch only the card_no from table2, so table 1 would be my left table. Sorting will not help either

 

1 REPLY 1
LinusH
Tourmaline | Level 20

Contents of this post seem like a duplicate.

But the title is totally different from the question?

Data never sleeps

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 1024 views
  • 0 likes
  • 2 in conversation