We've got several large tables where the primary key consists of multiple char variables. These are used in multiple heavy joins. My example table uses these 5 chars as PK: CUSTOMER_ID, CUSTOMER_SYSTEM_ID, PRODUCT_ID, COVERAGE_ID, OBJECT_ID.
I'd like to create 1 numeric composite key instead. Primarily to make the joins run faster. But also because our reusable script (used to process multiple tables) can be simplified by always having 1 PK per table.
My plan is to append the char variables together, and then use MD5 & HEX to create the numeric composite key value. Like this:
data want;
set have;
attrib pk_char format=$200.;
attrib pk_nr format=20.0;
pk_char = CUSTOMER_ID || '_' || CUSTOMER_SYSTEM_ID || '_' || PRODUCT_ID || '_' || COVERAGE_ID || '_' || OBJECT_ID;
pk_nr = input(put(md5(pk_char),$hex13.),hex13.);
run;
1) Is this a sensible approach, or do you have better suggestions?
2) I'm satisfied with format=$200 for the concatinated character string. This value should never be exceeded for any of the tables. An example value it might get today (from the example table) is "123684525218_562448_V48_26457_3". But I'm clueless about what hex format to go for. Would hex13 be fine for creating a uniqe numerical value for a string consisting of up to 200 symbols?
The MD5() function returns 128 bits. So at 8 bits per byte that is 16 bytes. That is waaaaaay too many to convert into a single number. So leave it as character and use the $HEX32 format to display it.
data want;
set have;
attrib pk_char length=$200;
attrib pk length=$16 format=$hex32.;
pk_char = catx('_',of CUSTOMER_ID CUSTOMER_SYSTEM_ID PRODUCT_ID COVERAGE_ID OBJECT_ID);
pk = md5(pk_char);
run;
Also there is no need to attach the $200 format to PK_CHAR. SAS does not need any special instructions for how to display display character strings.
The MD5() function returns 128 bits. So at 8 bits per byte that is 16 bytes. That is waaaaaay too many to convert into a single number. So leave it as character and use the $HEX32 format to display it.
data want;
set have;
attrib pk_char length=$200;
attrib pk length=$16 format=$hex32.;
pk_char = catx('_',of CUSTOMER_ID CUSTOMER_SYSTEM_ID PRODUCT_ID COVERAGE_ID OBJECT_ID);
pk = md5(pk_char);
run;
Also there is no need to attach the $200 format to PK_CHAR. SAS does not need any special instructions for how to display display character strings.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.