Solved: Re: HASH Duplicate Error - NO DUPS!

JW2702 · Posted 09-15-2017 03:54 AM

I keep getting this error when I try to use a hash table:

ERROR: Duplicate key.

I have other tables that run fine, so presumably the code is fine, but this particular table will not work. I have done a proc sort nodupkey on exactly the same keys I am using and there are no duplicates in the table at all. So I don't understand why it says this particular table has a duplicate key.

This is the code:

data workjw.base2;
set workjw.base;

if _n_ = 0 then do;

set workjw.acct_tpe_bin;

end;

if _n_ = 1 then do;

declare hash b(dataset: "workjw.acct_tpe_bin", duplicate: "r");
rc2=b.definekey("CH_ACCT_NO_BIN", "ACCT_TPE_CDE");
rc2=b.definedata("ACC_DESC_BIN");
rc2=b.definedone();

b.add();

end;

rc2=b.find();

if rc2 ne 0 then do;
ACC_DESC_BIN="";
end;

run;

Patrick · Posted 09-15-2017 06:22 AM

@KachiM

As soon as you remove b.add() things will work.

libname workjw (work);

data workjw.acct_tpe_bin;
  CH_ACCT_NO_BIN=1;
  ACCT_TPE_CDE=1;
  ACC_DESC_BIN=1;
  output;
  CH_ACCT_NO_BIN=1;
  ACCT_TPE_CDE=1;
  ACC_DESC_BIN=1;
  output;
run;

data workjw.base;
  CH_ACCT_NO_BIN=1;
  ACCT_TPE_CDE=1;
  output;

  CH_ACCT_NO_BIN=2;
  ACCT_TPE_CDE=2;
  output;

run;



data workjw.base2;
  set workjw.base;

  if _n_ = 0 then
    do;
      set workjw.acct_tpe_bin;
    end;

  if _n_ = 1 then
    do;
      declare hash b(dataset: "workjw.acct_tpe_bin");
      rc2=b.definekey("CH_ACCT_NO_BIN", "ACCT_TPE_CDE");
      rc2=b.definedata("ACC_DESC_BIN");
      rc2=b.definedone();
/*      b.add();*/
    end;

  rc2=b.find();

  if rc2 ne 0 then
    do;
      ACC_DESC_BIN="";
    end;
run;

View solution in original post

JW2702 · Posted 09-15-2017 04:01 AM

SAS 9.4, running on a Windows Server.

I have also tried closing and starting in a new session, but that made no difference at all.

KachiM · Posted 09-15-2017 05:20 AM

Using the following statement,

declare hash b(dataset: "workjw.acct_tpe_bin", duplicate: "r");

you are asking to add the last value, replacing the previous values. This means that if there are duplicate records, they will be happily replaced and no ERROR message will come.

Your Duplicate Error message is strange!

Place your LOG to see any clues.

ChrisBrooks · Posted 09-15-2017 05:37 AM

I believe the 'duplicate: "r"' statement only refers to the initial load of the data set. The error is coming on the add() method. If you supply a return value for the error code on that e.g. rc=b.add() you won't get the error. Although you appear to be loading the same information twice anyway so I'm not sure why you need to call b.add() at all.

JW2702 · Posted 09-15-2017 05:37 AM

It just has this:

NOTE: There were 147 observations read from the data set workjw.acct_tpe_bin
ERROR: Duplicate key.

If I try sorting with nodupkey, it gives no errors:

2590 proc sort data=workjw.acct_tpe_bin force nodupkey; by ACCT_TPE_CDE CH_ACCT_NO_BIN ; run;

NOTE: There were 147 observations read from the data set WORKJW.ACCT_TPE_BIN.
NOTE: 0 observations with duplicate key values were deleted.
NOTE: The data set WORKJW.ACCT_TPE_BIN has 147 observations and 3 variables.

I checked that the field I am using as a key is the same length and format on both tables. They are.

ChrisBrooks · Posted 09-15-2017 05:43 AM

As I said above - the whole data set is loaded into the hash table when it is declared - you don't need to exclude the duplicate key records. You are then trying to add the first record into the hash again with b.add() which is where the error comes

Patrick · Posted 09-15-2017 06:13 AM

@JW2702

I'm repeating here what @ChrisBrooks already wrote but may be posting a bit of code will make the issue clearer to you.

Below statement...

declare hash b(dataset: "workjw.acct_tpe_bin");

...loads the whole source table into the hash. The default for a source table having duplicate keys is: Only the first occurence of a key combination will get loaded into the hash. Any later duplicate key records will be ignored. The process will run without error or warning.

The problematic part in your code is: b.add();

  if _n_ = 1 then
    do;
      declare hash b(dataset: "workjw.acct_tpe_bin");
      .....
      b.add();
    end;

You've already loaded the whole table deduped into the hash. Now with this b.add() you're instructing SAS to load observation 1 (if _n_=1) once more into the hash. ...but you've loaded the table already via the dataset:<dataset> directive in the declare statement so b.add() is now trying to load a duplicate. That's what causes the error.

The default setting for a hash table is to not accept duplicate keys:

- Remove command b.add() and everything will work as expected.

- No need to pre-sort and dedupe the source table for the hash prior to loading. Statement Declare Hash will take care of this.

And here a bit a shorter and working code version:


data workjw.base2;
  set workjw.base;

  if _n_ = 1 then
    do;
      if 0 then set workjw.acct_tpe_bin(keep=CH_ACCT_NO_BIN ACCT_TPE_CDE ACC_DESC_BIN);
      declare hash b(dataset: "workjw.acct_tpe_bin");
      b.definekey("CH_ACCT_NO_BIN", "ACCT_TPE_CDE");
      b.definedata("ACC_DESC_BIN");
      b.definedone();
    end;

  if b.find() then call missing(ACC_DESC_BIN);

run;

KachiM · Posted 09-15-2017 06:20 AM

In addition to the observations made by others, why not you try the following stripped code and see whether you still get the ERROR.

data workjw.base2;
 
if _n_ = 0 then do;
     set workjw.acct_tpe_bin;
end;
 
if _n_ = 1 then do;
     declare hash b(dataset: "workjw.acct_tpe_bin");
     rc2=b.definekey("CH_ACCT_NO_BIN", "ACCT_TPE_CDE");
     rc2=b.definedata("ACC_DESC_BIN");
     rc2=b.definedone();
end;
 
rc = b.output(dataset:'out_01');
 
run;

Patrick · Posted 09-15-2017 06:22 AM

@KachiM

As soon as you remove b.add() things will work.

libname workjw (work);

data workjw.acct_tpe_bin;
  CH_ACCT_NO_BIN=1;
  ACCT_TPE_CDE=1;
  ACC_DESC_BIN=1;
  output;
  CH_ACCT_NO_BIN=1;
  ACCT_TPE_CDE=1;
  ACC_DESC_BIN=1;
  output;
run;

data workjw.base;
  CH_ACCT_NO_BIN=1;
  ACCT_TPE_CDE=1;
  output;

  CH_ACCT_NO_BIN=2;
  ACCT_TPE_CDE=2;
  output;

run;



data workjw.base2;
  set workjw.base;

  if _n_ = 0 then
    do;
      set workjw.acct_tpe_bin;
    end;

  if _n_ = 1 then
    do;
      declare hash b(dataset: "workjw.acct_tpe_bin");
      rc2=b.definekey("CH_ACCT_NO_BIN", "ACCT_TPE_CDE");
      rc2=b.definedata("ACC_DESC_BIN");
      rc2=b.definedone();
/*      b.add();*/
    end;

  rc2=b.find();

  if rc2 ne 0 then
    do;
      ACC_DESC_BIN="";
    end;
run;

JW2702 · Posted 09-15-2017 06:54 AM

Thanks. I was playing around and changed the add to say "rc2=b.add();" and that worked.

Registration is open

SAS Training: Just a Click Away