Solved: Re: hash tables

fre · Posted 05-20-2016 07:17 AM

Hi Community,

I've searched for a solution but I didn't find it. Hopefully someone can give me some advice.

I have an input_table which contains a KEY variable (unique value) with 1 data variable "variable_to_be_corrected".

Based on another table, tableX, (which will be the hash table) I want to lookup the KEY and correct the variable "variable_to_be_corrected" based on the value for the variable 'correction' in the hash table tableX.

How do I manage this correctly? I thought the find method could be used here, but I can't make it work.

The last line of code doesn't work, which I indicated in the code below:

data want;

if _n_=1 then do; 
	
  if 0 then set tableX;
  declare hash o(dataset:'tableX');
  o.definekey('KEY');
  o.defineData('correction');
  o.definedone();

end;

set input_table;

if o.check(key:KEY)=0 then variable_to_be_corrected = o.find('correction') ; /* <-- ERROR HERE */
run;

RW9 · Posted 05-20-2016 09:10 AM

So:

data tablea;
  input key $ variable_a;
datalines;
001          1 
002          1
003          2
004          5
;
run;
 
data tableb;
  input key $ variable_a;
datalines;
003          3
;
run;

data want;
  update tablea tableb;
  by key;
run;

View solution in original post

Loko · Posted 05-20-2016 07:28 AM

Hello,

The hash does not work like this. I think you shall start by reading some useful papers, for example:

http://support.sas.com/resources/papers/proceedings10/151-2010.pdf

http://www2.sas.com/proceedings/forum2008/029-2008.pdf

http://www2.sas.com/proceedings/sugi30/236-30.pdf

RW9 · Posted 05-20-2016 07:48 AM

You can do such updating in a couple of different ways:

- Datastep with update statement

- Datastep with if then construct

Or SQL (just an example as no test datstep provided):

proc sql;
  update DATA_TO_BE_UPDATED
  set VARIABLE=xyz
  where  <condition>;
quit;

fre · Posted 05-20-2016 08:09 AM

Thank you both for your feedback.

What I wanted to realise is to correct some observations in table A based on observations in table B in a datastep.

table A:

KEY

variable_A

table B:

KEY

variable_A

Table A contains mostly correct values for variable A, but sometimes not.

For each value in table A, my program has to look if there is an occurance in table B (based on KEY), and if so, it has to update the value from variable_A in table A with the value from variable_A in table B.

As I understand, hash tables is not the best approach.

I was looking for a solution using a data step, but maybe with a proc SQL, it will be easier.

RW9 · Posted 05-20-2016 08:32 AM

If you post test data (form of a datastep) for each and what the output should look like we can provide code. What happens for instance if there are multiple examples of KEY? What is a "correct" value? Its sounds like the best solution, based on what you provided is the datastep with update setup.

fre · Posted 05-20-2016 08:43 AM

okay, see here some testdata:

Given are the following tables:

- table A:

KEY Variable_A

001 1

002 1

003 2

004 5

- table B:

KEY Variable_A

003 3

What code would I need if I want this output:

- table C:

KEY Variable_A

001 1

002 1

003 3

004 5

Table C is the same as Table A, except for the observatons that occur in table B.

How could this be done in a datastep?

RW9 · Posted 05-20-2016 09:10 AM

So:

data tablea;
  input key $ variable_a;
datalines;
001          1 
002          1
003          2
004          5
;
run;
 
data tableb;
  input key $ variable_a;
datalines;
003          3
;
run;

data want;
  update tablea tableb;
  by key;
run;

fre · Posted 05-20-2016 09:43 AM

Thank you very much. This is indeed a very simple and compact way.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away