SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Hash Left Join

Reply
Occasional Contributor
Posts: 16

Hash Left Join

Hi,

 

I need to do Hash joins to get the following result by joining B to A

 

Dataset: A   
IDNameSex 
1JayM 
2JeeM 
3MinF 
    
Dataset: B   
IDSal  
155  
265  
375  
155  
265  
375  
155  
265  
375  
    
    
Result   
IDNameSexSal
1JayM55
2JeeM65
3MinF75
    
Super User
Posts: 4,022

Re: Hash Left Join

Why do you have to do a hash join? It is a lot easier using SQL or a DATA step merge.

Occasional Contributor
Posts: 16

Re: Hash Left Join

I have to combine 7M records and it cannot be done using SQl and Data step due to processing constraints.
Respected Advisor
Posts: 4,794

Re: Hash Left Join

@jpm2478

It shouldn't be hard to propose a solution once we understand what you really have and need. Because your narrative (left join) doesn't match the desired result you've posted, it's a bit hard to propose anything.

Can you please answer @PGStats question and then "amend" your question so we can understand what needs to be done?

Super User
Posts: 4,022

Re: Hash Left Join

I don't believe that an SQL join or a DATA step merge is too "process constrained" to handle 7M records. I've got DATA step merges that handle close to 200M records in a reasonable time frame. My approach is to use the simplest method that still processes efficiently. Hash joins have their place and I use them myself, but only for special cases where they offer a major advantage.

 

If you have tried other techniques please post the results including run times so we help you choose the best approach.

Trusted Advisor
Posts: 1,391

Re: Hash Left Join

You posted result is not what a left join would produce.  It would match every A record with each B record with the same ID, produce a new record for each match.  Result would be 9 records.

Esteemed Advisor
Posts: 5,624

Re: Hash Left Join


As @mkeintz said, the example you gave is not a left join. Given the following data, what would be the result? Knowing what your operational constraints are would also help us to propose an appropriate solution.

 

Dataset: A      
ID Name Sex  
1 Jay M  
2 Jee M  
3 Min F

 

4 Xxx M  
Dataset: B      
ID Sal    
1 55    
2 65    
3 75    
1 155    
2 165    
3 175    
1 255    
2 365    
3 375    
5 500    
       
Result ?      
ID Name Sex  
       
       
       
       

 

PG
Ask a Question
Discussion stats
  • 6 replies
  • 91 views
  • 0 likes
  • 5 in conversation