BookmarkSubscribeRSS Feed
jpm2478
Calcite | Level 5

Hi,

 

I need to do Hash joins to get the following result by joining B to A

 

Dataset: A   
IDNameSex 
1JayM 
2JeeM 
3MinF 
    
Dataset: B   
IDSal  
155  
265  
375  
155  
265  
375  
155  
265  
375  
    
    
Result   
IDNameSexSal
1JayM55
2JeeM65
3MinF75
    
6 REPLIES 6
SASKiwi
PROC Star

Why do you have to do a hash join? It is a lot easier using SQL or a DATA step merge.

jpm2478
Calcite | Level 5
I have to combine 7M records and it cannot be done using SQl and Data step due to processing constraints.
Patrick
Opal | Level 21

@jpm2478

It shouldn't be hard to propose a solution once we understand what you really have and need. Because your narrative (left join) doesn't match the desired result you've posted, it's a bit hard to propose anything.

Can you please answer @PGStats question and then "amend" your question so we can understand what needs to be done?

SASKiwi
PROC Star

I don't believe that an SQL join or a DATA step merge is too "process constrained" to handle 7M records. I've got DATA step merges that handle close to 200M records in a reasonable time frame. My approach is to use the simplest method that still processes efficiently. Hash joins have their place and I use them myself, but only for special cases where they offer a major advantage.

 

If you have tried other techniques please post the results including run times so we help you choose the best approach.

mkeintz
PROC Star

You posted result is not what a left join would produce.  It would match every A record with each B record with the same ID, produce a new record for each match.  Result would be 9 records.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PGStats
Opal | Level 21

As @mkeintz said, the example you gave is not a left join. Given the following data, what would be the result? Knowing what your operational constraints are would also help us to propose an appropriate solution.

 

Dataset: A      
ID Name Sex  
1 Jay M  
2 Jee M  
3 Min F

 

4 Xxx M  
Dataset: B      
ID Sal    
1 55    
2 65    
3 75    
1 155    
2 165    
3 175    
1 255    
2 365    
3 375    
5 500    
       
Result ?      
ID Name Sex  
       
       
       
       

 

PG

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1579 views
  • 0 likes
  • 5 in conversation