Using hash tables to replace multiple variables

EmilyAV · Posted 10-01-2021 09:01 AM

Hello,

I have a dataset with thousands of observations, and need to reclassify hundreds of observations in three different variables, including the lookup/key variable. I'm trying to use a hash table but my current code simply adds the hash table variables to the old dataset and (mostly) codes them as missing. What am I doing wrong?

*Hash table;

data reclass;
input key new_key new_var1 new_var2;
datalines;
2 1 . 9
17 1 . 17
32 1 . 3
33 1 . 14
41 1 . 15
;
run;

*Applying hash table to dataset old;

data output;
length key new_key new_var1 new_var2 8.;

if 0 then set old reclass;

if _N_ = 1 then do;
Declare hash w (dataset:'reclass', ordered:'Y');
w.defineKey ("key");
w.defineData("key","new_key", "new_var1", "new_var2");
w.defineDone();
w.replace(key:17, data: 17, data:1, data:., data:7);
w.replace(key:2, data: 2, data:1, data:., data:9);
w.replace(key:32, data:32, data:1, data:., data:3);
w.replace(key:33, data:33, data:1, data:., data:14);
w.replace(key:41, data:41, data:1, data:., data:15);
w.output (dataset:'test');
w.delete ();
end;

set old;
run;

Tom · Posted 10-01-2021 10:15 AM

Sounds like your want to do:

Load a small table into a hash object. You seem to have provided an example of that table. Then you have a large table that you want to modify by checking if the current value matches an entry in the hash object. You have not provided any example of this table.

So your pattern should be something like:

Load hash object
SET main table
use FIND() method to check if variable value is in the hash
potentially copy the values from the variables defined in the hash back into the variables in the main table(if the names are different)

Patrick · Posted 10-01-2021 11:29 AM

If I understand right what you're trying to achieve then a merge is may-be simpler. I've used for your reclass table the same variable names than in the have table. That's not a must but makes things even simpler. IF in the step merging the tables there is a match with the reclass table then the values from this table will overwrite the ones in the have table. If the variable names would be different then you'd have to code for it.

Also consider to not replace the key but to introduce some additional group_key or the like.

data reclass;
  input key group_key var1 var2;
  datalines;
2 1 . 9
17 1 . 17
32 1 . 3
33 1 . 14
41 1 . 15
;

data have;;
  set reclass end=last;
  var1=888;
  var2=999;
  output;
  if last then
    do;
      key=10000;
      output;
    end;
run;

proc sort data=reclass;
  by key;
run;
proc sort data=have;
  by key;
run;

data want;
  merge have(in=ina) reclass;
  by key;
  if ina;
run;

Using hash tables to replace multiple variables

Re: Using hash tables to replace multiple variables

Re: Using hash tables to replace multiple variables

Using hash tables to replace multiple variables

Re: Using hash tables to replace multiple variables

Re: Using hash tables to replace multiple variables

SAS Innovate 2025: Call for Content