Here is an excerpt of a case I opened with SAS TS:
I’ve got an ETL process where I’m doing change record detection, and modifying data (close old record, add new record) if a change has occurred. It’s somewhat analogous to SCD2 processing.
I’ve ran into a problem where data was copied into the wrong column. This error was corrected, but due to the way CATX works, a change is not detected.
An example will help to illustrate the point:
data test;
length var1 var2 var3 $1 var4 var5 8 buffer1 buffer2 $32767 digest1 digest2 $32;
input var1-var5;
buffer1=catx("|", of var1-var5);
digest1=put(md5(buffer1),hex32.);
buffer2=strip(var1) || "|" || strip(var2) || "|" || strip(var3) || "|" || strip(var4) || "|" || strip(var5);
digest2=put(md5(buffer2),hex32.);
datalines;
A B . 1 1
A . B 1 1
;
run;
I'd prefer to use the CATX function - I can use a variable list, I don't have to think about data type conversion, and don’t have to jump through hoops in my code generation. My actual code has about 40 variables I need to check for changes.
The solution, of course, is a flag for the CATX function that would mimic the concatenation operator. IOW, print the delimiter regardless of whether the data value is missing.
I think (???) the CATX function could be changed where it would not break existing code. Example invocations:
buffer1=catx("|", of var1-var5, “Y”);
buffer1=catx("|", var1, var2, var3, var4, var5, “Y”);
The last parameter does not have to be “Y”, it just needs to be some character string, that could not be confused with a variable, that would trigger CATX to print the delimiter regardless of whether the variable is missing or not. There would be a delimiter for every variable – 1.
Questions:
Can you think of an ingenious way to address this problem, other than the concatenation operator as above? Some ideas:
* Perhaps build the buffer one variable at a time, translating missing to “00”x, to “trick” CATX into printing the delimiter. But spinning over the variable list would be problematic since it is a mix of char and num.
* Or, perhaps I can export the dataset as a delimited file, read that file back in without delimiters, and use _INFILE_ as the buffer for the MD5 function.
* Or, could I use PROC FCMP to roll my own CATX function that would give me the desired results? (BTW, the desired results is a buffer with one delimiter per variable - 1)
Thanks...
Try using the CATQ function:
data test;
length var1 var2 var3 $1 var4 var5 8 buffer1 buffer2 buffer3 $32767 digest1 digest2 digest3 $32;
input var1-var5;
buffer1=catx("|", of var1-var5);
digest1=put(md5(buffer1),hex32.);
buffer2=strip(var1) || "|" || strip(var2) || "|" || strip(var3) || "|" || strip(var4) || "|" || strip(var5);
digest2=put(md5(buffer2),hex32.);
buffer3 = catq("MTD", "|", of var1-var5);
digest3=put(md5(buffer3),hex32.);
datalines;
A B . 1 1
A . B 1 1
;
proc print data=test noobs; run;
PG
Hi All,
I think I've solved my own problem. I remembered a technique I first saw proposed by (a really smart guy, you all should follow him ).
I get one extra delimiter at the beginning of the buffer string, but for my purposes of creating a message digest, it meets my needs for change detection.
If you can think of a better approach, please let me know.
Thanks...
* test data, should be a different digest for each observation ;
data test;
length var1 var2 var3 $1 var4 var5 8;
input var1-var5;
datalines;
A B . 1 .
A . B 1 .
A B . . 1
A . B . 1
;
run;
* define the variables of interest. i've purposely dropped off var3 ;
%let vars=var1 var2 var4 var5;
* first test with default option for missing numerics ;
options missing=".";
data test2;
* use a "naked format" to define the PDV order ;
* the _dummy_ variable in the PDV is an end-of-processing marker ;
format &vars _dummy_;
* prevent unitialized variable notes ;
retain _dummy_ "";
set test;
length _name $32 _value buffer1 $32767 digest1 $32 buffer2 $32767 digest2 $32;
buffer1=catx("|", of var1-var5);
digest1=put(md5(buffer1),hex32.);
do while (1);
* get next variable in the PDV ;
call vnext(_name);
* bail if variable name = _dummy_ ;
if (_name="_dummy_") then leave;
* get the variable value. no notes on numeric to character conversion, which is nice ;
_value=vvaluex(_name);
buffer2=cats(buffer2,"|",_value);
end;
digest2=put(md5(buffer2),hex32.);
run;
* then do the same test as above with blank for missing numerics ;
options missing=" ";
data test3;
format &vars _dummy_;
retain _dummy_ "";
set test;
length _name $32 _value buffer1 $32767 digest1 $32 buffer2 $32767 digest2 $32;
buffer1=catx("|", of var1-var5);
digest1=put(md5(buffer1),hex32.);
do while (1);
call vnext(_name);
if (_name="_dummy_") then leave;
_value=vvaluex(_name);
buffer2=cats(buffer2,"|",_value);
end;
digest2=put(md5(buffer2),hex32.);
run;
* and if the PDV order is really critical to you, grab Richard Devenezia's (another really smart guy) %seplist macro ... (Google for his website) ;
data test4;
set test;
retain _dummy_ "";
length _name $32 _value buffer1 $32767 digest1 $32 buffer2 $32767 digest2 $32;
buffer1=catx("|", of var1-var5);
digest1=put(md5(buffer1),hex32.);
do while (1);
call vnext(_name);
if (_name="_dummy_") then leave;
if (_name in (%seplist(&vars,nest=QQ))) then do;
_value=vvaluex(_name);
buffer2=cats(buffer2,"|",_value);
end;
end;
digest2=put(md5(buffer2),hex32.);
run;
Try using the CATQ function:
data test;
length var1 var2 var3 $1 var4 var5 8 buffer1 buffer2 buffer3 $32767 digest1 digest2 digest3 $32;
input var1-var5;
buffer1=catx("|", of var1-var5);
digest1=put(md5(buffer1),hex32.);
buffer2=strip(var1) || "|" || strip(var2) || "|" || strip(var3) || "|" || strip(var4) || "|" || strip(var5);
digest2=put(md5(buffer2),hex32.);
buffer3 = catq("MTD", "|", of var1-var5);
digest3=put(md5(buffer3),hex32.);
datalines;
A B . 1 1
A . B 1 1
;
proc print data=test noobs; run;
PG
Thanks PG. I thought I knew all the CAT* functions, apparently CATQ escaped my reading of the doc :smileyblush:. Besides being less code, I'm sure it's more performant than my approach.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.