DATA Step, Macro, Functions and more

HASH Issue

Accepted Solution Solved
Reply
Contributor
Posts: 69
Accepted Solution

HASH Issue

Dear SAS Users:


I have one dataset x, has n=4,684,804 and 1 variable Receipt with Length $13..

I have another dataset y, has n=1.5404E8 records with 37 variables. Receipt has Minimum Length=3, and Maximum Length=13.

 

I am using HASH technique two merge x and y on Receipt column.

 

DATA WORK.FILE_CDIM00; 
IF 0 THEN SET X; 
if _N_ = 1 then do; 
declare hash HASH_NAME(dataset: "Y", multidata: 'y'); 
HASH_NAME.defineKEY("RECEIPT"); 
HASH_NAME.defineData (ALL:'YES'); 
HASH_NAME.defineDone();
END;

set Y(where= (13<=length(RECEIPT)<=13) ) ; 
IF HASH_NAME.FIND(KEY:RECEIPT) = 0 THEN OUTPUT; 
RUN;

 

I am getting Error as Shown below:
WARNING: Multiple lengths were specified for the variable RECEIPT by input data set(s). This can cause truncation of data.
ERROR: Hash object added 2293744 items when memory failure occurred.
FATAL: Insufficient memory to execute DATA step program. Aborted during the EXECUTION phase.

 

Any suggestions , idea, clue ?


Accepted Solutions
Solution
‎04-27-2017 01:17 PM
Super User
Posts: 10,041

Re: HASH Issue

If I was right , the length of RECEIPT in Y is longer than in X.

 

So either you define it by hand

length RECEIPT $ 200;

 

or make RECEIPT of Y initialize before X.

 

if 0 then set Y;

if 0 then set X;

 

 

and your memory is too small to hold the data from your LOG.

View solution in original post


All Replies
Super User
Super User
Posts: 7,977

Re: HASH Issue

What does the data structure look like (this is why posting test data is always a good idea).  As from your text you are almost saying that Receipt is length 13 in both cases - however the Warning is telling you this is not the case.  Make sure you fix the length in your datasets, no good having a structure which keeps changing as it can cause you all kinds of problems.

On your Has Error, thats a large amount of data to be putting into memory, and you seem to have run out.  Is there a reason you need to use hash for what seems like a simple data merge task?

data file_cdim00;
  merge x (in=a) y (in=b);
  by receipt;
  if b then output;
run;

Or you might be able to let SQL do it by:

proc sql;
  create table FILE_CDIM00 as
  select  *
  from    X
  where  RECEIPT in (select RECEIPT from Y);
quit;

It will of course take a long time to run, with just under 5mil rows you can expect that.  It may be that by working differently however you could avoid this totally, maybe you could use ranges of receipts rather than lists (and I can't see your data here), somethin glike:
receipt

000000001

000000010

...

range=1 to 10, so if receipt between input(min(receipt,best.)) and input(max(receipt),best.) then output.

Super User
Posts: 7,831

Re: HASH Issue

Don't use a hash for that dataset, it is simply too large.

Use the common technique of sorting and merging.

 

The way I read your code, you are reading Y (with all variables) into the hash, and then merge it with itself. Didn't you want to read X into the hash, for lookup?

 

Before the operation, you should make sure that your datasets have identical attributes for variable receipt (type character, length 13). That will prevent the WARNING.

 

And your condition

where= (13<=length(RECEIPT)<=13)

is equal to

where= (length(RECEIPT)=13)
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 10,041

Re: HASH Issue

Try 

IF 0 THEN SET X; 

-->
IF 0 THEN SET Y; 
IF 0 THEN SET X; 

Contributor
Posts: 69

Re: HASH Issue

Not clear, Sharp.
Can you please assist.
Solution
‎04-27-2017 01:17 PM
Super User
Posts: 10,041

Re: HASH Issue

If I was right , the length of RECEIPT in Y is longer than in X.

 

So either you define it by hand

length RECEIPT $ 200;

 

or make RECEIPT of Y initialize before X.

 

if 0 then set Y;

if 0 then set X;

 

 

and your memory is too small to hold the data from your LOG.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 126 views
  • 0 likes
  • 4 in conversation