Hello,
My ultimate goal is to create a cartesian product of 2 tables with each table containing over 40k rows. Below is the code I have been testing to eventually apply to the 40k+ rows dataset. When I run this code with ZIP and HASH_ZIP variables as numeric, it works fine. However, the 40k+ dataset that I will be applying this code to has ZIP in character format. But when I run this code with ZIP and HASH_ZIP as character variables, I get an error message. I have provided the error message below the code. I would greatly appreciate any assistance in helping me fix this code.
Thank You!!!
DATA ZIP;
INPUT ZIP $ POP;
CARDS;
10 100
20 200
30 300
;
RUN;
DATA HASH_ZIP;
INPUT HASH_ZIP $;
CARDS;
10
20
30
;
RUN;
DATA ZIP_BASE;
IF _N_ = 1 THEN DO;
DCL HASH H(DATASET : "HASH_ZIP", ordered: "A", multidata: "Y");
H.DEFINEKEY("HASH_ZIP");
H.DEFINEDATA(ALL : "HASH_ZIP");
H.DEFINEDONE();
DCL HITER I("H");
END;
SET ZIP;
HASH_ZIP = .;
DO WHILE (I.NEXT() = 0);
OUTPUT;
END;
RUN;
Below is the error message I am receiving when ZIP and HASH_ZIP are set as character variables:
WARNING: Invalid value for hash object argument tag at line 51 column 7. Assuming the value NO.
ERROR: Type mismatch for key variable HASH_ZIP at line 52 column 7.
ERROR: Hash data set load failed at line 52 column 7.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.
This would have been easier to diagnose if you had copied the log (including the SAS statements with line numbers).
But I infer that the warning "invalid value for hash object argument tag at line 51 column 7. Assuming the value NO." means line 51 is the H.definedata(all:'"HASHZIP"); statement. The argument should, in your case by "Y", as in H.definedata(all:"Y");
The "type mismatch" is because, while you told the hash declaration to use the HASH_ZIP dataset, you didn't tell the sas compiler, so it does not know that the variable hash_zip is character. Add the statement:
if 0 then set hash_zip;prior to the "IF _N_=1" do block (or at the start of the block). It takes no input action, but it obligates the SAS compiler to get the needed metadata for variables in the hash_zip dataset.
This would have been easier to diagnose if you had copied the log (including the SAS statements with line numbers).
But I infer that the warning "invalid value for hash object argument tag at line 51 column 7. Assuming the value NO." means line 51 is the H.definedata(all:'"HASHZIP"); statement. The argument should, in your case by "Y", as in H.definedata(all:"Y");
The "type mismatch" is because, while you told the hash declaration to use the HASH_ZIP dataset, you didn't tell the sas compiler, so it does not know that the variable hash_zip is character. Add the statement:
if 0 then set hash_zip;prior to the "IF _N_=1" do block (or at the start of the block). It takes no input action, but it obligates the SAS compiler to get the needed metadata for variables in the hash_zip dataset.
mkeintz, that WORKED! Thank YOU!!!!!!!
I took the easy way out initially and set the zip codes to numeric in the source table (because US zips don't contain characters) and it seemed to run faster than when they were set as a character variable. Not sure if there is a rational explanation for this or if I was just imagining things :-). Regardless, this ran at an incredibly faster rate than without using HASH tables. Thanks once again.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
