BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
donspaul
Calcite | Level 5

Hello,

 

My ultimate goal is to create a cartesian product of 2 tables with each table containing over 40k rows.  Below is the code I have been testing to eventually apply to the 40k+ rows dataset.  When I run this code with ZIP and HASH_ZIP variables as numeric, it works fine.  However, the 40k+ dataset that I will be applying this code to has ZIP in character format.  But when I run this code with ZIP and HASH_ZIP as character variables, I get an error message.  I have provided the error message below the code.  I would greatly appreciate any assistance in helping me fix this code.

 

Thank You!!!

 

DATA ZIP;
INPUT ZIP $ POP;
CARDS;
10 100
20 200
30 300
;
RUN;

 

DATA HASH_ZIP;
INPUT HASH_ZIP $;
CARDS;
10
20
30
;
RUN;


DATA ZIP_BASE;
IF _N_ = 1 THEN DO;
DCL HASH H(DATASET : "HASH_ZIP", ordered: "A", multidata: "Y");
H.DEFINEKEY("HASH_ZIP");
H.DEFINEDATA(ALL : "HASH_ZIP");
H.DEFINEDONE();
DCL HITER I("H");
END;

SET ZIP;
HASH_ZIP = .;

DO WHILE (I.NEXT() = 0);
OUTPUT;
END;
RUN;

 

Below is the error message I am receiving when ZIP and HASH_ZIP are set as character variables:

 

WARNING: Invalid value for hash object argument tag at line 51 column 7. Assuming the value NO.
ERROR: Type mismatch for key variable HASH_ZIP at line 52 column 7.
ERROR: Hash data set load failed at line 52 column 7.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

This would have been easier to diagnose if you had copied the log (including the SAS statements with line numbers).  

 

But I infer that the warning "invalid value for hash object argument tag at line 51 column 7. Assuming the value NO." means line 51 is the H.definedata(all:'"HASHZIP"); statement.  The argument should, in your case by "Y", as in H.definedata(all:"Y");

 

The "type mismatch" is because, while you told the hash declaration to use the HASH_ZIP dataset, you didn't tell the sas compiler, so it does not know that the variable hash_zip is  character.   Add the statement:

 

if 0 then set hash_zip;

prior to the "IF _N_=1" do block (or at the start of the block).  It takes no input action, but it obligates the SAS compiler to get the needed metadata for variables in the hash_zip dataset.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star

This would have been easier to diagnose if you had copied the log (including the SAS statements with line numbers).  

 

But I infer that the warning "invalid value for hash object argument tag at line 51 column 7. Assuming the value NO." means line 51 is the H.definedata(all:'"HASHZIP"); statement.  The argument should, in your case by "Y", as in H.definedata(all:"Y");

 

The "type mismatch" is because, while you told the hash declaration to use the HASH_ZIP dataset, you didn't tell the sas compiler, so it does not know that the variable hash_zip is  character.   Add the statement:

 

if 0 then set hash_zip;

prior to the "IF _N_=1" do block (or at the start of the block).  It takes no input action, but it obligates the SAS compiler to get the needed metadata for variables in the hash_zip dataset.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
donspaul
Calcite | Level 5

mkeintz, that WORKED!  Thank YOU!!!!!!!

I took the easy way out initially and set the zip codes to numeric in the source table (because US zips don't contain characters) and it seemed to run faster than when they were set as a character variable.  Not sure if there is a rational explanation for this or if I was just imagining things :-).  Regardless, this ran at an incredibly faster rate than without using HASH tables.  Thanks once again. 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 279 views
  • 0 likes
  • 2 in conversation