BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hello,

I want to make a join between a small table and a big one.

The hash method seems to me the right solution.

i develop this program, but the result  whith hash method is different whith sql

In the case where I have to join with data many to many, how can I adapt this solution?

thank you

 

data  a ;
input key adata adata2 ;
cards ;
1 1 2
1 1 3
2 2 5
3 3 4
4 4 6
5 5 5
6 6 6
7 7 5
;
run ;
data b ;
input key bdata ;
cards ;
1 11
1 12
3 31
4 4
6 61
6 62
6 63
7 7
;
run ;


proc sql;
create table aa as select a.key,adata,bdata,adata2
from a left join b on a.key=b.key;
quit;

data c;
 if _n_ = 1 then do ;
 if 0 then set a ;
 dcl hash b (dataset: "a", multidata: "y",ordered:'y') ;
 b.definekey ("key") ;
 b.definedata (all:'y') ;
 b.definedone () ;
 end;

  do until(eof);
   set b end=eof;
    if b.find()=0 then output;
    else do;
    call missing(adata ,bdata, adata2);
    output;
    end;
 end;
stop;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data  a ;
input key adata adata2 ;
cards ;
1 1 2
1 1 3
2 2 5
3 3 4
4 4 6
5 5 5
6 6 6
7 7 5
;
run ;
data b ;
input key bdata ;
cards ;
1 11
1 12
3 31
4 4
6 61
6 62
6 63
7 7
;
run ;


proc sql;
create table aa as select a.key,adata,bdata,adata2
from a left join b on a.key=b.key;
quit;

data want_hash;
 if 0 then set a b ;
 dcl hash b (dataset: "b", multidata: "y",ordered:'y') ;
 b.definekey ("key") ;
 b.definedata (all:'y') ;
 b.definedone () ;
  do until(eof);
   set a end=eof;
   rc=b.find();
   if rc ne 0 then do;
	call missing(bdata);
	output;
	end;
   else 
    do  while(rc eq 0);
     output;
	 rc=b.find_next();
    end;
 end;
stop;
drop rc;
run;

View solution in original post

11 REPLIES 11
novinosrin
Tourmaline | Level 20
data  a ;
input key adata adata2 ;
cards ;
1 1 2
1 1 3
2 2 5
3 3 4
4 4 6
5 5 5
6 6 6
7 7 5
;
run ;
data b ;
input key bdata ;
cards ;
1 11
1 12
3 31
4 4
6 61
6 62
6 63
7 7
;
run ;


proc sql;
create table aa as select a.key,adata,bdata,adata2
from a left join b on a.key=b.key;
quit;

data want_hash;
 if 0 then set a b ;
 dcl hash b (dataset: "b", multidata: "y",ordered:'y') ;
 b.definekey ("key") ;
 b.definedata (all:'y') ;
 b.definedone () ;
  do until(eof);
   set a end=eof;
   rc=b.find();
   if rc ne 0 then do;
	call missing(bdata);
	output;
	end;
   else 
    do  while(rc eq 0);
     output;
	 rc=b.find_next();
    end;
 end;
stop;
drop rc;
run;
mansour_ib_sas
Pyrite | Level 9

Thank you

novinosrin
Tourmaline | Level 20

Some Notes:

1. No need for if _n_=1 when going for a full pass of dataset while _n_=1, since there is gonna be only one hash instance at run time

2. do until pass akin to dow is apparently not any different in performance if you used if _n_=1 ,creating a hash instance and a traditional data-step as portrayed in sas docs.

3. If you are using PD hash guru's style of coding, great but you have to be very careful knowing how instances  value vary and differ.

 

Well after all, it's all about diligence

mansour_ib_sas
Pyrite | Level 9

 

We can  add a filtre on the date column in the big table?

 

novinosrin
Tourmaline | Level 20

I don't see any date values in your samples. I assume you can

mansour_ib_sas
Pyrite | Level 9

Hello,

 

Why the 8 and 9 did not come in result whith Hash method?

thank you 

data  a ;
input key adata adata2 ;
cards ;
1 1 2
1 1 3
2 2 5
8 7 5
9 5 4
;
run ;

data b ;
input key bdata ;
cards ;
1 11
1 12
3 31
4 4
6 61
6 62
6 63
7 7
;
run ;


proc sql;
create table yes as select a.*,b.bdata
from a left join b on a.key=b.key;
quit;


data want_hash;
 if 0 then set a ;
 dcl hash b (dataset: "a", multidata: "y",ordered:'y') ;
 b.definekey ("key") ;
 b.definedata (all:'y') ;
 b.definedone () ;
  do until(eof);
   set b end=eof;
   rc=b.find();
   if rc ne 0 then do;
	call missing(bdata);
	output;
	end;
   else 
    do  while(rc eq 0);
     output;
	 rc=b.find_next();
    end;
 end;
stop;

run;
novinosrin
Tourmaline | Level 20

Your left table is a , but in your code

 

  do until(eof);
   set b end=eof;

 so change the above to

  do until(eof);
   set a end=eof;

 

mansour_ib_sas
Pyrite | Level 9

Yes, the table A is the left table.

It's the small table that is loaded in memory through declare the object hash

In the statement set; the big table is B. 

novinosrin
Tourmaline | Level 20

to match the sql results, the change i suggested works

mansour_ib_sas
Pyrite | Level 9

The table B have 6 millions obseravation. loaded this in memory ?

novinosrin
Tourmaline | Level 20

Ok, please clarify and elaborate your requirement with your HAVE samples and WANT sample clearly. Let us work on the solution. 

 

6 million is not big in my opinion. Use hashexp:20 just in case

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1864 views
  • 1 like
  • 2 in conversation