BookmarkSubscribeRSS Feed
SAS_inquisitive
Lapis Lazuli | Level 10

I want merge three tables. In order to avoid sort procedure,I am inclined to use hash method. I have sucessfully used it for joining two tables. I wonder if hash table can be used to join more than two tables.

12 REPLIES 12
LinusH
Tourmaline | Level 20
Yes, I'm almost 100% sure.
In DI Studio the Lookup transformation uses hash tables, and are able to do lookup on several tables simultaneously.
Data never sleeps
SAS_inquisitive
Lapis Lazuli | Level 10

@LinusH I don't  have access to DI studio.  Is it doable in Base SAS?

Haikuo
Onyx | Level 15

Yes.

LinusH
Tourmaline | Level 20
Absolutely. I was just bringing up the code generation from DI Studio as an example. Almost all code generated is in Base SAS code.
Data never sleeps
SAS_inquisitive
Lapis Lazuli | Level 10

@Haikuo and @LinusH  Can you give an example (or DI generated code) so that I can apply in my cases?  Thanks !

Patrick
Opal | Level 21

@SAS_inquisitive

If you know how to define and use one hash table then what holds you back to simply define and use a second hash table as well?

You can load as much data into as many hash tables as you wish. The only restriction is memory available.

SAS_inquisitive
Lapis Lazuli | Level 10

@Patrick  I want to make sure I am doing correctly. I looked some papers (Paul Dorfman's), most of them have only two tables.

Haikuo
Onyx | Level 15

@SAS_inquisitive: Have you tried yet? If you have and it did not work the way you want, show us what you have tried (code) and what didn't work (log). If you haven't tried, now it is the time for you to do so.

SAS_inquisitive
Lapis Lazuli | Level 10

@Haikuo I have not tried on more than two tables yet.  I will try to ceate some mock data sets (rather than real data sets) and will post the code. Here is one mock example.

This does not give desired result.

data a;
	input id x y;
	cards;
1 3 4
2 5 6
3 5 6
4 7 9
5 3 2
;

data b;
	input id x y;
	cards;
1 7 8
2 3 4
3 1 3
7 8 4
8 1 6
;

data c;
	input id x y;
	cards;
1 7 8
2 3 4
3 1 3
9 8 4
7 1 6
;

data test;
	if 0 then
		set a;
	if 0 then 
	    set b;

	if _n_=1 then
		do;
			dcl hash h1(dataset:'a');
			h1.definekey('id');
			h1.definedata('x','y');
				h1.definedone();

			dcl hash h2(dataset:'b');
			h2.definekey('id');
			h2.definedata('x','y');
				h2.definedone();
		end;

	set c;
	if h1.find() and h2.find() then output;
run;

 

Haikuo
Onyx | Level 15

Ok, that was nice try :). Couple of comments before SAS code:

It is hard for me to guess the purpose of your code without seeing your expected outcome. So here is my try:

1. From your code, you seem trying to do a SQL equivalent of 'inner join' , however, your code is doing it exclusively instead. the commented part of the code will do a inclusive inner join.

2. Since 3 of your incoming tables share exact variable names, so what you wound up doing is pdv is first overidden by h1 then by h2. What you want is probably to rename some of the variables to see some obvious outcome.

 

data a;
	input id x_A y_A;
	cards;
1 3 4
2 5 6
3 5 6
4 7 9
5 3 2
;

data b;
	input id x_B y_B;
	cards;
1 7 8
2 3 4
3 1 3
7 8 4
8 1 6
;

data c;
	input id x_C y_C;
	cards;
1 7 8
2 3 4
3 1 3
9 8 4
7 1 6
;

data test;
	if 0 then
		set a B;

	/*	if 0 then */
	/*	    set b;*/
	if _n_=1 then
		do;
			dcl hash h1(dataset:'a');
			h1.definekey('id');
	h1.definedata(ALL:
			'Y');
			h1.definedone();
			dcl hash h2(dataset:'b');
			h2.definekey('id');
	h2.definedata(ALL:
			'y');
			h2.definedone();
		end;

	set c;

	if h1.find() and h2.find() then
		output;

	/*		 if h1.find()=0 and h2.find()=0 then output;*/
run;

Going bed right now, if you have further questions, I may find some time tomorrow.

SAS_inquisitive
Lapis Lazuli | Level 10

@Haikuo Thank you.  Have a good night.

Patrick
Opal | Level 21

@Haikuo states that the expected result should be the equivalent to a SQL inner join but then changes the selection logic so that it's no more an inner join. Must have been late... 

 

I would code as below.

data test;
  set c;
  if _n_=1 then
    do;
      if 0 then set a;
      dcl hash h1(dataset:'a');
      h1.definekey('id');
      h1.definedata(ALL:'Y');
      h1.definedone();

      if 0 then set b;
      dcl hash h2(dataset:'b');
      h2.definekey('id');
      h2.definedata(ALL:'y');
      h2.definedone();
    end;
  if h1.find()=0 and h2.find()=0 then output;
run;

The reason for having "set c;" first is that in case there are same named variables in the source datasets the definition from base table "c" will get used for the output dataset.

 

In case the relationship between the tables can be 0:N and not only 0:1 then you would also have to use multidata:'y' for the hash definition and code for some looping over the hash as well in case there is a match where there are multiple entries for the same key in the hash.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1294 views
  • 0 likes
  • 4 in conversation