DATA Step, Macro, Functions and more

HASH Duplicate Error - NO DUPS!

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

HASH Duplicate Error - NO DUPS!

I keep getting this error when I try to use a hash table:

ERROR: Duplicate key.

 

I have other tables that run fine, so presumably the code is fine, but this particular table will not work. I have done a proc sort nodupkey on exactly the same keys I am using and there are no duplicates in the table at all. So I don't understand why it says this particular table has a duplicate key.

 

This is the code:

 

data workjw.base2;
set workjw.base;

 

if _n_ = 0 then do;

     set workjw.acct_tpe_bin;

end;

 

if _n_ = 1 then do;

     declare hash b(dataset: "workjw.acct_tpe_bin", duplicate: "r");
     rc2=b.definekey("CH_ACCT_NO_BIN", "ACCT_TPE_CDE");
     rc2=b.definedata("ACC_DESC_BIN");
     rc2=b.definedone();

     b.add();

end;

 

rc2=b.find();

 

if rc2 ne 0 then do;
ACC_DESC_BIN="";
end;

 

run;

 


Accepted Solutions
Solution
‎09-15-2017 06:56 AM
Respected Advisor
Posts: 4,173

Re: HASH Duplicate Error - NO DUPS!

@datasp

As soon as you remove b.add() things will work.

libname workjw (work);

data workjw.acct_tpe_bin;
  CH_ACCT_NO_BIN=1;
  ACCT_TPE_CDE=1;
  ACC_DESC_BIN=1;
  output;
  CH_ACCT_NO_BIN=1;
  ACCT_TPE_CDE=1;
  ACC_DESC_BIN=1;
  output;
run;

data workjw.base;
  CH_ACCT_NO_BIN=1;
  ACCT_TPE_CDE=1;
  output;

  CH_ACCT_NO_BIN=2;
  ACCT_TPE_CDE=2;
  output;

run;



data workjw.base2;
  set workjw.base;

  if _n_ = 0 then
    do;
      set workjw.acct_tpe_bin;
    end;

  if _n_ = 1 then
    do;
      declare hash b(dataset: "workjw.acct_tpe_bin");
      rc2=b.definekey("CH_ACCT_NO_BIN", "ACCT_TPE_CDE");
      rc2=b.definedata("ACC_DESC_BIN");
      rc2=b.definedone();
/*      b.add();*/
    end;

  rc2=b.find();

  if rc2 ne 0 then
    do;
      ACC_DESC_BIN="";
    end;
run;

View solution in original post


All Replies
New Contributor
Posts: 4

Re: HASH Duplicate Error - NO DUPS!

SAS 9.4, running on a Windows Server.

I have also tried closing and starting in a new session, but that made no difference at all.

Super Contributor
Posts: 298

Re: HASH Duplicate Error - NO DUPS!

Using the following statement,

 

 declare hash b(dataset: "workjw.acct_tpe_bin", duplicate: "r");

 

you are asking to add the last value, replacing the previous values. This means that if there are duplicate records, they will be happily replaced and no ERROR message will come. 

 

Your Duplicate Error message is strange!

 

Place your LOG to see any clues.

Super Contributor
Posts: 440

Re: HASH Duplicate Error - NO DUPS!

I believe the 'duplicate: "r"' statement only refers to the initial load of the data set. The error is coming on the add() method. If you supply a return value for the error code on that e.g. rc=b.add() you won't get the error. Although you appear to be loading the same information twice anyway so I'm not sure why you need to call b.add() at all.

New Contributor
Posts: 4

Re: HASH Duplicate Error - NO DUPS!

It just has this:

 

NOTE: There were 147 observations read from the data set workjw.acct_tpe_bin
ERROR: Duplicate key.

 

If I try sorting with nodupkey, it gives no errors:

2590 proc sort data=workjw.acct_tpe_bin force nodupkey; by ACCT_TPE_CDE CH_ACCT_NO_BIN ; run;

NOTE: There were 147 observations read from the data set WORKJW.ACCT_TPE_BIN.
NOTE: 0 observations with duplicate key values were deleted.
NOTE: The data set WORKJW.ACCT_TPE_BIN has 147 observations and 3 variables.

 

I checked that the field I am using as a key is the same length and format on both tables. They are.

Super Contributor
Posts: 440

Re: HASH Duplicate Error - NO DUPS!

As I said above - the whole data set is loaded into the hash table when it is declared - you don't need to exclude the duplicate key records. You are then trying to add the first record into the hash again with b.add() which is where the error comes

Respected Advisor
Posts: 4,173

Re: HASH Duplicate Error - NO DUPS!

[ Edited ]

@JW2702

I'm repeating here what @ChrisBrooks already wrote but may be posting a bit of code will make the issue clearer to you.

 

Below statement...

declare hash b(dataset: "workjw.acct_tpe_bin");

...loads the whole source table into the hash. The default for a source table having duplicate keys is: Only the first occurence of a key combination will get loaded into the hash. Any later duplicate key records will be ignored. The process will run without error or warning.

 

The problematic part in your code is: b.add();

  if _n_ = 1 then
    do;
      declare hash b(dataset: "workjw.acct_tpe_bin");
      .....
      b.add();
    end;

You've already loaded the whole table deduped into the hash. Now with this b.add() you're instructing SAS to load observation 1 (if _n_=1) once more into the hash. ...but you've loaded the table already via the dataset:<dataset> directive in the declare statement so b.add() is now trying to load a duplicate. That's what causes the error.

 

The default setting for a hash table is to not accept duplicate keys:

- Remove command b.add() and everything will work as expected.

- No need to pre-sort and dedupe the source table for the hash prior to loading. Statement Declare Hash will take care of this.

 

And here a bit a shorter and working code version:


data workjw.base2;
  set workjw.base;

  if _n_ = 1 then
    do;
      if 0 then set workjw.acct_tpe_bin(keep=CH_ACCT_NO_BIN ACCT_TPE_CDE ACC_DESC_BIN);
      declare hash b(dataset: "workjw.acct_tpe_bin");
      b.definekey("CH_ACCT_NO_BIN", "ACCT_TPE_CDE");
      b.definedata("ACC_DESC_BIN");
      b.definedone();
    end;

  if b.find() then call missing(ACC_DESC_BIN);

run;

 

Super Contributor
Posts: 298

Re: HASH Duplicate Error - NO DUPS!

In addition to the observations made by others, why not you try the following stripped code and see whether you still get the ERROR.

 

data workjw.base2;
 
if _n_ = 0 then do;
     set workjw.acct_tpe_bin;
end;
 
if _n_ = 1 then do;
     declare hash b(dataset: "workjw.acct_tpe_bin");
     rc2=b.definekey("CH_ACCT_NO_BIN", "ACCT_TPE_CDE");
     rc2=b.definedata("ACC_DESC_BIN");
     rc2=b.definedone();
end;
 
rc = b.output(dataset:'out_01');
 
run;
Solution
‎09-15-2017 06:56 AM
Respected Advisor
Posts: 4,173

Re: HASH Duplicate Error - NO DUPS!

@datasp

As soon as you remove b.add() things will work.

libname workjw (work);

data workjw.acct_tpe_bin;
  CH_ACCT_NO_BIN=1;
  ACCT_TPE_CDE=1;
  ACC_DESC_BIN=1;
  output;
  CH_ACCT_NO_BIN=1;
  ACCT_TPE_CDE=1;
  ACC_DESC_BIN=1;
  output;
run;

data workjw.base;
  CH_ACCT_NO_BIN=1;
  ACCT_TPE_CDE=1;
  output;

  CH_ACCT_NO_BIN=2;
  ACCT_TPE_CDE=2;
  output;

run;



data workjw.base2;
  set workjw.base;

  if _n_ = 0 then
    do;
      set workjw.acct_tpe_bin;
    end;

  if _n_ = 1 then
    do;
      declare hash b(dataset: "workjw.acct_tpe_bin");
      rc2=b.definekey("CH_ACCT_NO_BIN", "ACCT_TPE_CDE");
      rc2=b.definedata("ACC_DESC_BIN");
      rc2=b.definedone();
/*      b.add();*/
    end;

  rc2=b.find();

  if rc2 ne 0 then
    do;
      ACC_DESC_BIN="";
    end;
run;

New Contributor
Posts: 4

Re: HASH Duplicate Error - NO DUPS!

Thanks. I was playing around and changed the add to say "rc2=b.add();" and that worked.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 157 views
  • 0 likes
  • 4 in conversation