Re: How to perform a right join on a hash to keep all values from look...

beacon · Posted 11-07-2017 02:19 PM

Hi,

I have the following datasets:

/* Dataset 1 */
DATA table1;
	LENGTH pcode $ 8
	       xcode $ 2
	       bcode $ 2;
	INFILE DATALINES DLM='|' DSD;
	INPUT pcode $ xcode $ bcode $;
	DATALINES;
	00000001|03|A
	00000002|03|B
	00000003|04|C1
	00000004|05|D2
	00000005|05|D3    /* This value is not in table2, but needs to be added */
RUN;

/* Dataset 2 */
DATA table2;
	LENGTH pcode $ 8
	       xcode $ 2
	       xcode_key $ 2
	       bcode $ 2
	       bcode_key $ 2;
	INFILE DATALINES DLM='|' DSD;
	INPUT pcode $ xcode $ xcode_key $ bcode $ bcode_key;
	DATALINES;
	00000001|03|10|A|5
	00000002|03|10|B|10
	00000003|04|20|C1|15
	00000004|05|30|D2|20
RUN;

I would like to use table1 as my hash lookup table so I can compare it to table2, but I want to keep all of the values from table1 and only those that match from table2. Is this possible?

If I were doing a PROC SQL, it would look like this:

PROC SQL;
    SELECT
        t1.pcode, t2.xcode, t2.bcode
    FROM
        table1 t1
        LEFT JOIN
        table2 t2 ON
            t1.pcode = t2.pcode
            AND t1.xcode = t2.xcode
            AND t1.bcode = t2.bcode
QUIT;

I was able to implement a LEFT JOIN from table2 to table1 using the code below, but that's the opposite of what I want.

DATA want(DROP=xcode_key bcode_key);
	LENGTH CODE_FOUND $ 1;
	
	IF 0 THEN SET table1;
	
	DECLARE HASH codes(DATASET: 'WORK.table1');
	codes.DefineKey('pcode', 'xcode', 'bcode');
	codes.DefineData(ALL: 'YES');
	codes.DefineDone();
	
	DO UNTIL (EOF);
		SET table2 END = EOF;
		
		IF codes.Find() = 0 THEN DO;
			CODE_FOUND = 'T';
			OUTPUT;
		END;
		ELSE DO;
			CODE_FOUND = 'F';
			CALL MISSING(OF pcode--bcode);
			OUTPUT;
		END;
	END;
RUN;

I was hoping to perform the lookup and output this way because I'm trying to determine which values from table1 need to be added/removed from table2.

Here's a sample of the expected output:

pcode        xcode        bcode        CODE_FOUND
00000001     03           A            T
00000002     03           B            T
00000003     04           C1           T
00000004     05           D2           T
00000005     05           D3           F

Thanks.

error_prone · Posted 11-07-2017 02:42 PM

Interesting problem. The proc sql you posted has errors. Why do you want to use a hash-object? A merge should be ok:

data want;
   merge table2(in=two) table1(in=one);
   by pcode xcode bcode;
   
   code_found = ifc(two and one, 'T', 'F');
   
   keep pcode xcode bcode code_found;
run;

beacon · Posted 11-07-2017 03:08 PM

@error_prone wrote:
Interesting problem. The proc sql you posted has errors. Why do you want to use a hash-object? A merge should be ok

I'm familiar with the merge way, but was wanting to use a hash object because everything I've read indicates that it is super fast and I've been wanting to learn more about hash objects.

While my post shows small data sets, I do have instances where I'm working with lookup tables that could be much larger and comparing them to huge data sets.

Also, what are the errors in the SQL code?

Thanks.

error_prone · Posted 11-07-2017 03:20 PM

Error-message:

ERROR: Ambiguous reference, column xcode is in more than one table.
ERROR: Ambiguous reference, column pcode is in more than one table.
ERROR: Ambiguous reference, column bcode is in more than one table.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.

beacon · Posted 11-07-2017 03:28 PM

I forgot the aliases. Edited the original post.

error_prone · Posted 11-07-2017 03:40 PM

Could not get it to work with table1 as lookup.

/* Dataset 1 */
DATA table1;
	LENGTH pcode $ 8
	       xcode $ 2
	       bcode $ 2;
	INFILE DATALINES DLM='|' DSD;
	INPUT pcode $ xcode $ bcode $;
	DATALINES;
00000001|03|A
00000002|03|B
00000003|04|C1
00000004|05|D2
00000005|05|D3    /* This value is not in table2, but needs to be added */
	;
RUN;
 
/* Dataset 2 */
DATA table2;
	LENGTH pcode $ 8
	       xcode $ 2
	       xcode_key $ 2
	       bcode $ 2
	       bcode_key $ 2;
	INFILE DATALINES DLM='|' DSD;
	INPUT pcode $ xcode $ xcode_key $ bcode $ bcode_key;
	DATALINES;
00000001|03|10|A|5
00000002|03|10|B|10
00000003|04|20|C1|15
00000004|05|30|D2|20
00000007|05|42|XX|42 /* should not be in want-dataset */
;
RUN;

data want;
   if 0 then set table2;
   
   if _n_ = 1 then do;
      declare hash h(dataset: "table2");
      h.defineKey("pcode", "xcode", "bcode");
      /*h.defineData(all: "yes"); no need to define data-elements */
      h.defineDone();
   end;
   
   set table1;
   
   CODE_FOUND = ifc(h.check() = 0, "T", "F");
run;

beacon · Posted 11-07-2017 03:50 PM

I did something similar in my testing, but I'm concerned that I won't be able to use table2 in most situations due to the size of table2 being too large.

KachiM · Posted 11-08-2017 01:29 AM

Firstly, what are your key part and data part of the hash table. My guess is that pcode and xcode are the key part. If this is true, then read further.

A way out for using Table1 as part of hash table is to add a FLAG to the data part of the hash table with value of 1. As your Table2

matches the key in the hash table, mark the flag as zero and replace the hash item. This way, we can identify which hash item is not matched to keys of Table2 when the FLAG having value of ONE.

Hope this solution meets your requirements.

data want;
   if _n_ = 1 then do;
      flag = 1;
      declare hash h();
      h.definekey('pcode','xcode');
      h.definedata('pcode','xcode','bcode','flag');
      h.definedone();
      declare hiter hi('h');
      do until(last);
         set table1 end = last;
         h.add();
      end;
   end;
   do until(eof);
      set table2 end = eof;
      if h.find() = 0 then do;
         output;
         flag = 0;
         h.replace();
      end;
   end;
   do while(hi.next() = 0);
      call missing(xcode_key,bcode_key);
      if flag = 1 then output;
   end;
drop flag;
run;

Obs	pcode	xcode	bcode	xcode_key	bcode_key
1	00000001	03	A	10	5
2	00000002	03	B	10	10
3	00000003	04	C1	20	15
4	00000004	05	D2	30	20
5	00000005	05	D3

Ksharp · Posted 11-08-2017 08:00 AM

Here is the code I wrote a couple of years ago to mock JOIN by hash table.

Hope could help you a little bit.

Use hash table to simulate JOIN is a real tough task. You need consider many scenarios .

data  tab1;
input a_var1  a_value;
datalines;
1  11
1  11
2  22
3  33
9  88
run;

data tab2;
input b_var1  b_value;
datalines;
1  11
2  22
3  99
4  999
run;
data want;
 if _n_ eq 1 then do;
  if 0 then set tab2;
  declare hash h(dataset:'tab2',multidata:'y');
  declare hiter hi('h');
  h.definekey('b_var1');
  h.definedata('b_var1','b_value');
  h.definedone();
 end;
set tab1 end=last;
by a_var1;
if first.a_var1  then call missing(b_var1,b_value);
if h.find(key:a_var1)=0 then h.removedup(key:a_var1);

output;
if last.a_var1  then do;
 rc=h.find(key:a_var1);
 do while(rc=0); 
  output;
  rc=h.find_next(key:a_var1);
 end;
end;

if last then do;
call missing(of _all_);
 do while(hi.next()=0); 
  output;
 end;
end;
drop rc;
run;
proc print noobs;run;

How to perform a right join on a hash to keep all values from lookup table?

Re: How to perform a right join on a hash to keep all values from lookup table?

Re: How to perform a right join on a hash to keep all values from lookup table?

Re: How to perform a right join on a hash to keep all values from lookup table?

Re: How to perform a right join on a hash to keep all values from lookup table?

Re: How to perform a right join on a hash to keep all values from lookup table?

Re: How to perform a right join on a hash to keep all values from lookup table?

Re: How to perform a right join on a hash to keep all values from lookup table?

Re: How to perform a right join on a hash to keep all values from lookup table?

SAS Innovate 2025: Call for Content

Classroom Training Available!