BookmarkSubscribeRSS Feed
beacon
Obsidian | Level 7

Hi,

I have the following datasets:

/* Dataset 1 */
DATA table1;
	LENGTH pcode $ 8
	       xcode $ 2
	       bcode $ 2;
	INFILE DATALINES DLM='|' DSD;
	INPUT pcode $ xcode $ bcode $;
	DATALINES;
	00000001|03|A
	00000002|03|B
	00000003|04|C1
	00000004|05|D2
	00000005|05|D3    /* This value is not in table2, but needs to be added */
RUN;

 

/* Dataset 2 */
DATA table2;
	LENGTH pcode $ 8
	       xcode $ 2
	       xcode_key $ 2
	       bcode $ 2
	       bcode_key $ 2;
	INFILE DATALINES DLM='|' DSD;
	INPUT pcode $ xcode $ xcode_key $ bcode $ bcode_key;
	DATALINES;
	00000001|03|10|A|5
	00000002|03|10|B|10
	00000003|04|20|C1|15
	00000004|05|30|D2|20
RUN;


I would like to use table1 as my hash lookup table so I can compare it to table2, but I want to keep all of the values from table1 and only those that match from table2. Is this possible?

If I were doing a PROC SQL, it would look like this:

PROC SQL;
    SELECT
        t1.pcode, t2.xcode, t2.bcode
    FROM
        table1 t1
        LEFT JOIN
        table2 t2 ON
            t1.pcode = t2.pcode
            AND t1.xcode = t2.xcode
            AND t1.bcode = t2.bcode
QUIT;


I was able to implement a LEFT JOIN from table2 to table1 using the code below, but that's the opposite of what I want.

DATA want(DROP=xcode_key bcode_key);
	LENGTH CODE_FOUND $ 1;
	
	IF 0 THEN SET table1;
	
	DECLARE HASH codes(DATASET: 'WORK.table1');
	codes.DefineKey('pcode', 'xcode', 'bcode');
	codes.DefineData(ALL: 'YES');
	codes.DefineDone();
	
	DO UNTIL (EOF);
		SET table2 END = EOF;
		
		IF codes.Find() = 0 THEN DO;
			CODE_FOUND = 'T';
			OUTPUT;
		END;
		ELSE DO;
			CODE_FOUND = 'F';
			CALL MISSING(OF pcode--bcode);
			OUTPUT;
		END;
	END;
RUN;

I was hoping to perform the lookup and output this way because I'm trying to determine which values from table1 need to be added/removed from table2.

Here's a sample of the expected output:

pcode        xcode        bcode        CODE_FOUND
00000001     03           A            T
00000002     03           B            T
00000003     04           C1           T
00000004     05           D2           T
00000005     05           D3           F


Thanks.

8 REPLIES 8
error_prone
Barite | Level 11

Interesting problem. The proc sql you posted has errors. Why do you want to use a hash-object? A merge should be ok:

 

data want;
   merge table2(in=two) table1(in=one);
   by pcode xcode bcode;
   
   code_found = ifc(two and one, 'T', 'F');
   
   keep pcode xcode bcode code_found;
run;
beacon
Obsidian | Level 7

@error_prone wrote:

Interesting problem. The proc sql you posted has errors. Why do you want to use a hash-object? A merge should be ok


 

I'm familiar with the merge way, but was wanting to use a hash object because everything I've read indicates that it is super fast and I've been wanting to learn more about hash objects.

 

While my post shows small data sets, I do have instances where I'm working with lookup tables that could be much larger and comparing them to huge data sets.

 

Also, what are the errors in the SQL code?

 

Thanks.

error_prone
Barite | Level 11

Error-message:

ERROR: Ambiguous reference, column xcode is in more than one table.
ERROR: Ambiguous reference, column pcode is in more than one table.
ERROR: Ambiguous reference, column bcode is in more than one table.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
beacon
Obsidian | Level 7
I forgot the aliases. Edited the original post.
error_prone
Barite | Level 11

Could not get it to work with table1 as lookup.

/* Dataset 1 */
DATA table1;
	LENGTH pcode $ 8
	       xcode $ 2
	       bcode $ 2;
	INFILE DATALINES DLM='|' DSD;
	INPUT pcode $ xcode $ bcode $;
	DATALINES;
00000001|03|A
00000002|03|B
00000003|04|C1
00000004|05|D2
00000005|05|D3    /* This value is not in table2, but needs to be added */
	;
RUN;
 
/* Dataset 2 */
DATA table2;
	LENGTH pcode $ 8
	       xcode $ 2
	       xcode_key $ 2
	       bcode $ 2
	       bcode_key $ 2;
	INFILE DATALINES DLM='|' DSD;
	INPUT pcode $ xcode $ xcode_key $ bcode $ bcode_key;
	DATALINES;
00000001|03|10|A|5
00000002|03|10|B|10
00000003|04|20|C1|15
00000004|05|30|D2|20
00000007|05|42|XX|42 /* should not be in want-dataset */
;
RUN;

data want;
   if 0 then set table2;
   
   if _n_ = 1 then do;
      declare hash h(dataset: "table2");
      h.defineKey("pcode", "xcode", "bcode");
      /*h.defineData(all: "yes"); no need to define data-elements */
      h.defineDone();
   end;
   
   set table1;
   
   CODE_FOUND = ifc(h.check() = 0, "T", "F");
run;
beacon
Obsidian | Level 7
I did something similar in my testing, but I'm concerned that I won't be able to use table2 in most situations due to the size of table2 being too large.
KachiM
Rhodochrosite | Level 12

Firstly, what are your key part and data part of the hash table. My guess is that pcode and xcode are the key part. If this is true, then read further.

A way out for using Table1 as part of hash table is to add a FLAG to the data part of the hash table with value of 1. As your Table2

matches the key in the hash table, mark the flag as zero and replace the hash item. This way, we can identify which hash item is not matched to keys of Table2 when the FLAG having value of ONE. 

Hope this solution meets your requirements.

data want;
   if _n_ = 1 then do;
      flag = 1;
      declare hash h();
      h.definekey('pcode','xcode');
      h.definedata('pcode','xcode','bcode','flag');
      h.definedone();
      declare hiter hi('h');
      do until(last);
         set table1 end = last;
         h.add();
      end;
   end;
   do until(eof);
      set table2 end = eof;
      if h.find() = 0 then do;
         output;
         flag = 0;
         h.replace();
      end;
   end;
   do while(hi.next() = 0);
      call missing(xcode_key,bcode_key);
      if flag = 1 then output;
   end;
drop flag;
run;

Obs	pcode	xcode	bcode	xcode_key	bcode_key
1	00000001	03	A	10	5
2	00000002	03	B	10	10
3	00000003	04	C1	20	15
4	00000004	05	D2	30	20
5	00000005	05	D3	 	 
Ksharp
Super User

Here is the code I wrote a couple of years ago to mock JOIN by hash table.

Hope could help you a little bit.

Use hash table to simulate JOIN is a real tough task. You need consider many scenarios .

 

data  tab1;
input a_var1  a_value;
datalines;
1  11
1  11
2  22
3  33
9  88
run;

data tab2;
input b_var1  b_value;
datalines;
1  11
2  22
3  99
4  999
run;
data want;
 if _n_ eq 1 then do;
  if 0 then set tab2;
  declare hash h(dataset:'tab2',multidata:'y');
  declare hiter hi('h');
  h.definekey('b_var1');
  h.definedata('b_var1','b_value');
  h.definedone();
 end;
set tab1 end=last;
by a_var1;
if first.a_var1  then call missing(b_var1,b_value);
if h.find(key:a_var1)=0 then h.removedup(key:a_var1);

output;
if last.a_var1  then do;
 rc=h.find(key:a_var1);
 do while(rc=0); 
  output;
  rc=h.find_next(key:a_var1);
 end;
end;

if last then do;
call missing(of _all_);
 do while(hi.next()=0); 
  output;
 end;
end;
drop rc;
run;
proc print noobs;run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1872 views
  • 0 likes
  • 4 in conversation