I want merge three tables. In order to avoid sort procedure,I am inclined to use hash method. I have sucessfully used it for joining two tables. I wonder if hash table can be used to join more than two tables.
@LinusH I don't have access to DI studio. Is it doable in Base SAS?
Yes.
If you know how to define and use one hash table then what holds you back to simply define and use a second hash table as well?
You can load as much data into as many hash tables as you wish. The only restriction is memory available.
@Patrick I want to make sure I am doing correctly. I looked some papers (Paul Dorfman's), most of them have only two tables.
@SAS_inquisitive: Have you tried yet? If you have and it did not work the way you want, show us what you have tried (code) and what didn't work (log). If you haven't tried, now it is the time for you to do so.
@Haikuo I have not tried on more than two tables yet. I will try to ceate some mock data sets (rather than real data sets) and will post the code. Here is one mock example.
This does not give desired result.
data a; input id x y; cards; 1 3 4 2 5 6 3 5 6 4 7 9 5 3 2 ; data b; input id x y; cards; 1 7 8 2 3 4 3 1 3 7 8 4 8 1 6 ; data c; input id x y; cards; 1 7 8 2 3 4 3 1 3 9 8 4 7 1 6 ; data test; if 0 then set a; if 0 then set b; if _n_=1 then do; dcl hash h1(dataset:'a'); h1.definekey('id'); h1.definedata('x','y'); h1.definedone(); dcl hash h2(dataset:'b'); h2.definekey('id'); h2.definedata('x','y'); h2.definedone(); end; set c; if h1.find() and h2.find() then output; run;
Ok, that was nice try :). Couple of comments before SAS code:
It is hard for me to guess the purpose of your code without seeing your expected outcome. So here is my try:
1. From your code, you seem trying to do a SQL equivalent of 'inner join' , however, your code is doing it exclusively instead. the commented part of the code will do a inclusive inner join.
2. Since 3 of your incoming tables share exact variable names, so what you wound up doing is pdv is first overidden by h1 then by h2. What you want is probably to rename some of the variables to see some obvious outcome.
data a;
input id x_A y_A;
cards;
1 3 4
2 5 6
3 5 6
4 7 9
5 3 2
;
data b;
input id x_B y_B;
cards;
1 7 8
2 3 4
3 1 3
7 8 4
8 1 6
;
data c;
input id x_C y_C;
cards;
1 7 8
2 3 4
3 1 3
9 8 4
7 1 6
;
data test;
if 0 then
set a B;
/* if 0 then */
/* set b;*/
if _n_=1 then
do;
dcl hash h1(dataset:'a');
h1.definekey('id');
h1.definedata(ALL:
'Y');
h1.definedone();
dcl hash h2(dataset:'b');
h2.definekey('id');
h2.definedata(ALL:
'y');
h2.definedone();
end;
set c;
if h1.find() and h2.find() then
output;
/* if h1.find()=0 and h2.find()=0 then output;*/
run;
Going bed right now, if you have further questions, I may find some time tomorrow.
@Haikuo Thank you. Have a good night.
@Haikuo states that the expected result should be the equivalent to a SQL inner join but then changes the selection logic so that it's no more an inner join. Must have been late...
I would code as below.
data test;
set c;
if _n_=1 then
do;
if 0 then set a;
dcl hash h1(dataset:'a');
h1.definekey('id');
h1.definedata(ALL:'Y');
h1.definedone();
if 0 then set b;
dcl hash h2(dataset:'b');
h2.definekey('id');
h2.definedata(ALL:'y');
h2.definedone();
end;
if h1.find()=0 and h2.find()=0 then output;
run;
The reason for having "set c;" first is that in case there are same named variables in the source datasets the definition from base table "c" will get used for the output dataset.
In case the relationship between the tables can be 0:N and not only 0:1 then you would also have to use multidata:'y' for the hash definition and code for some looping over the hash as well in case there is a match where there are multiple entries for the same key in the hash.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.