BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GPatel
Pyrite | Level 9

Dear SAS Users:


I have one dataset x, has n=4,684,804 and 1 variable Receipt with Length $13..

I have another dataset y, has n=1.5404E8 records with 37 variables. Receipt has Minimum Length=3, and Maximum Length=13.

 

I am using HASH technique two merge x and y on Receipt column.

 

DATA WORK.FILE_CDIM00; 
IF 0 THEN SET X; 
if _N_ = 1 then do; 
declare hash HASH_NAME(dataset: "Y", multidata: 'y'); 
HASH_NAME.defineKEY("RECEIPT"); 
HASH_NAME.defineData (ALL:'YES'); 
HASH_NAME.defineDone();
END;

set Y(where= (13<=length(RECEIPT)<=13) ) ; 
IF HASH_NAME.FIND(KEY:RECEIPT) = 0 THEN OUTPUT; 
RUN;

 

I am getting Error as Shown below:
WARNING: Multiple lengths were specified for the variable RECEIPT by input data set(s). This can cause truncation of data.
ERROR: Hash object added 2293744 items when memory failure occurred.
FATAL: Insufficient memory to execute DATA step program. Aborted during the EXECUTION phase.

 

Any suggestions , idea, clue ?

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

If I was right , the length of RECEIPT in Y is longer than in X.

 

So either you define it by hand

length RECEIPT $ 200;

 

or make RECEIPT of Y initialize before X.

 

if 0 then set Y;

if 0 then set X;

 

 

and your memory is too small to hold the data from your LOG.

View solution in original post

5 REPLIES 5
RW9
Diamond | Level 26 RW9
Diamond | Level 26

What does the data structure look like (this is why posting test data is always a good idea).  As from your text you are almost saying that Receipt is length 13 in both cases - however the Warning is telling you this is not the case.  Make sure you fix the length in your datasets, no good having a structure which keeps changing as it can cause you all kinds of problems.

On your Has Error, thats a large amount of data to be putting into memory, and you seem to have run out.  Is there a reason you need to use hash for what seems like a simple data merge task?

data file_cdim00;
  merge x (in=a) y (in=b);
  by receipt;
  if b then output;
run;

Or you might be able to let SQL do it by:

proc sql;
  create table FILE_CDIM00 as
  select  *
  from    X
  where  RECEIPT in (select RECEIPT from Y);
quit;

It will of course take a long time to run, with just under 5mil rows you can expect that.  It may be that by working differently however you could avoid this totally, maybe you could use ranges of receipts rather than lists (and I can't see your data here), somethin glike:
receipt

000000001

000000010

...

range=1 to 10, so if receipt between input(min(receipt,best.)) and input(max(receipt),best.) then output.

Kurt_Bremser
Super User

Don't use a hash for that dataset, it is simply too large.

Use the common technique of sorting and merging.

 

The way I read your code, you are reading Y (with all variables) into the hash, and then merge it with itself. Didn't you want to read X into the hash, for lookup?

 

Before the operation, you should make sure that your datasets have identical attributes for variable receipt (type character, length 13). That will prevent the WARNING.

 

And your condition

where= (13<=length(RECEIPT)<=13)

is equal to

where= (length(RECEIPT)=13)
Ksharp
Super User
Try 

IF 0 THEN SET X; 

-->
IF 0 THEN SET Y; 
IF 0 THEN SET X; 

GPatel
Pyrite | Level 9
Not clear, Sharp.
Can you please assist.
Ksharp
Super User

If I was right , the length of RECEIPT in Y is longer than in X.

 

So either you define it by hand

length RECEIPT $ 200;

 

or make RECEIPT of Y initialize before X.

 

if 0 then set Y;

if 0 then set X;

 

 

and your memory is too small to hold the data from your LOG.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1241 views
  • 0 likes
  • 4 in conversation