Hello, I am new to using hash and doing most of my learning while troubleshooting old code. The following code has been used for a while, this is a small segment of a much larger piece. If I run the code more than once the variables that start with 'pre' will not be consistent, meaning the code doesn't do the same thing two times in a row. If anyone can look at this and see what the issue is I will be very appreciative. I'm going to keep looking through sas papers and try to find the issue myself as well. I'm not able to add this as a word doc so I added it as a pdf, if you would like it in any other format let me know. I'll also paste it below:
Thanks,
********************************************************************************;
*** MERGE MIGRATION DATA WITH DISCO AND SEGMENT TABLES ***;
********************************************************************************;
DATA WORK.&FILE_TYPE._&START_DATE._2;
LENGTH ABB_CORE_SEGMENT ABB_CORE_SEGMENT_2 ABB_CORE_SEGMENT_3 $16 PSU_VIDEO PSU_HSD PSU_DP MDS 3. VIDEO HSD DP ABB_LVL_1 $8;
DECLARE HASH SEG();
RC = SEG.DEFINEKEY ('ANLY_SEG_KEY');
RC = SEG.DEFINEDATA ('ABB_CORE_SEGMENT', 'ABB_CORE_SEGMENT_2', 'ABB_CORE_SEGMENT_3', 'ABB_LVL_1',
'PSU_VIDEO', 'PSU_HSD', 'PSU_DP', 'MDS', 'VIDEO', 'HSD', 'DP');
RC = SEG.DEFINEDONE();
DO UNTIL (EOF_SEG);
SET TWR0TXV.ANALYSIS_SEGMENT END = EOF_SEG;
RC = SEG.REPLACE();
END;
DO UNTIL (EOF_UNIVERSE);
SET WORK.&FILE_TYPE._&START_DATE._1 END=EOF_UNIVERSE;
*PRODUCT HOLDING DETAILS*;
ABB_CORE_SEGMENT=''; ABB_CORE_SEGMENT_2=''; ABB_CORE_SEGMENT_3=''; ABB_LVL_1=''; PSU_VIDEO=.; PSU_HSD=.; PSU_DP=.; MDS=.; VIDEO=''; HSD=''; DP='';
RC=SEG.FIND();
/* POST IS USED IN 1ST PASS FOR RC-AD, PRE IS USED FOR RD-DD */
IF SEG.FIND()=0 THEN POST_SGMNT_MATCH = 'Y'; ELSE POST_SGMNT_MATCH = 'N';
RENAME ABB_CORE_SEGMENT = POST_ABB_CORE_SEGMENT
ABB_CORE_SEGMENT_2 = POST_ABB_CORE_SEGMENT_2
ABB_CORE_SEGMENT_3 = POST_ABB_CORE_SEGMENT_3
ABB_LVL_1 = POST_ABB_LVL_1
PSU_VIDEO = POST_PSU_VIDEO
PSU_HSD = POST_PSU_HSD
PSU_DP = POST_PSU_DP
MDS = POST_MDS
VIDEO = POST_VIDEO
HSD = POST_HSD
DP = POST_DP;
OUTPUT; END; STOP; RUN;
DATA WORK.&FILE_TYPE._POST_TRANSACTIONS_2_&END_DATE.;
LENGTH ABB_CORE_SEGMENT ABB_CORE_SEGMENT_2 ABB_CORE_SEGMENT_3 $16 PSU_VIDEO PSU_HSD PSU_DP MDS 3. VIDEO HSD DP ABB_LVL_1 $8;
DECLARE HASH SEG();
RC = SEG.DEFINEKEY ('ANLY_SEG_KEY');
RC = SEG.DEFINEDATA ('ABB_CORE_SEGMENT', 'ABB_CORE_SEGMENT_2', 'ABB_CORE_SEGMENT_3', 'ABB_LVL_1', 'PSU_VIDEO', 'PSU_HSD', 'PSU_DP', 'MDS', 'VIDEO', 'HSD', 'DP');
RC = SEG.DEFINEDONE();
DO UNTIL (EOF_SEG);
SET TWR0TXV.ANALYSIS_SEGMENT END = EOF_SEG;
RC = SEG.REPLACE();
END;
DO UNTIL (EOF_UNIVERSE);
SET WORK.&FILE_TYPE._PRE_POST_TRANSACTIONS_&END_DATE. END=EOF_UNIVERSE;
*PRODUCT HOLDING DETAILS*;
ABB_CORE_SEGMENT=''; ABB_CORE_SEGMENT_2=''; ABB_CORE_SEGMENT_3=''; ABB_LVL_1=''; PSU_VIDEO=.; PSU_HSD=.; PSU_DP=.; MDS=.; VIDEO=''; HSD=''; DP='';
RC=SEG.FIND();
/* PRE IS USED IN 2ND PASS FOR RC-AD, POST IS USED FOR RD-DD */
IF SEG.FIND()=0 THEN PRE_SGMNT_MATCH = 'Y'; ELSE PRE_SGMNT_MATCH = 'N';
RENAME ABB_CORE_SEGMENT = PRE_ABB_CORE_SEGMENT
ABB_CORE_SEGMENT_2 = PRE_ABB_CORE_SEGMENT_2
ABB_CORE_SEGMENT_3 = PRE_ABB_CORE_SEGMENT_3
ABB_LVL_1 = PRE_ABB_LVL_1
PSU_VIDEO = PRE_PSU_VIDEO
PSU_HSD = PRE_PSU_HSD
PSU_DP = PRE_PSU_DP
MDS = PRE_MDS
VIDEO = PRE_VIDEO
HSD = PRE_HSD
DP = PRE_DP;
OUTPUT; END; STOP;
RUN;
Despite also being new to hashing, I have a reasonable guess.
You mention that this is part of a longer program. Could it be that the longer program later on changes the order of the observations in the source of the hash table (TWR0TWX.ANALYSIS_SEGMENT)? That would change the contents of the hash table the second time around, since it is being loaded using the REPLACE method.
Would the hash itself sort that dataset? I'm not sure, is there another method I can use rather than replace to check what your theory is?
I think the hash is used so sort doesn't matter, unless something was actually manipulating the table it shouldn't matter.
The hash doesn't affect the data set. (There are cases where hashing is used to create an output data set, but that is not its function here.) But sorting would impact the results IF you have multiple observations for the same ANLY_SEG_KEY. The REPLACE method should be using the look-up information from the LAST observation for each ANLY_SEG_KEY. So that is a check you could make, to verify whether there are any cases of multiple observations for the same ANLY_SEG_KEY. If so, then look for sorting later on in the program.
I checked and all of the anly_seg_keys are unique.
OK, that's out. Other changes might be possible that could change the contents of TWR0TWX.ANALYSIS_SEGMENT. Later processing might add records, delete records, or change data values. Other than that, I'm out of ideas though.
Good luck.
I looked at that dataset and it hasn't been changed for years while this code is run weekly. I'm going to replace the hash with a data step merge to verify that it's the hash that's causing the issue. I can't find anything wrong with it and I suppose it could be something else, it's only these variables that change from one run to the next so I wanted to start here. Thanks so much for taking a look.
Thanks for the help last week, I thought I'd follow up with you and let you know what I found, unfortunately I had you on a wild goose chase, there is nothing wrong with the hash code, the issue was elsewhere in code that I did not post for propriety reasons.
Good to know. Thanks.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.