Here is an excerpt of a case I opened with SAS TS:
I’ve got an ETL process where I’m doing change record detection, and modifying data (close old record, add new record) if a change has occurred. It’s somewhat analogous to SCD2 processing.
I’ve ran into a problem where data was copied into the wrong column. This error was corrected, but due to the way CATX works, a change is not detected.
An example will help to illustrate the point:
data test;
length var1 var2 var3 $1 var4 var5 8 buffer1 buffer2 $32767 digest1 digest2 $32;
input var1-var5;
buffer1=catx("|", of var1-var5);
digest1=put(md5(buffer1),hex32.);
buffer2=strip(var1) || "|" || strip(var2) || "|" || strip(var3) || "|" || strip(var4) || "|" || strip(var5);
digest2=put(md5(buffer2),hex32.);
datalines;
A B . 1 1
A . B 1 1
;
run;
I'd prefer to use the CATX function - I can use a variable list, I don't have to think about data type conversion, and don’t have to jump through hoops in my code generation. My actual code has about 40 variables I need to check for changes.
The solution, of course, is a flag for the CATX function that would mimic the concatenation operator. IOW, print the delimiter regardless of whether the data value is missing.
I think (???) the CATX function could be changed where it would not break existing code. Example invocations:
buffer1=catx("|", of var1-var5, “Y”);
buffer1=catx("|", var1, var2, var3, var4, var5, “Y”);
The last parameter does not have to be “Y”, it just needs to be some character string, that could not be confused with a variable, that would trigger CATX to print the delimiter regardless of whether the variable is missing or not. There would be a delimiter for every variable – 1.
Questions:
Can you think of an ingenious way to address this problem, other than the concatenation operator as above? Some ideas:
* Perhaps build the buffer one variable at a time, translating missing to “00”x, to “trick” CATX into printing the delimiter. But spinning over the variable list would be problematic since it is a mix of char and num.
* Or, perhaps I can export the dataset as a delimited file, read that file back in without delimiters, and use _INFILE_ as the buffer for the MD5 function.
* Or, could I use PROC FCMP to roll my own CATX function that would give me the desired results? (BTW, the desired results is a buffer with one delimiter per variable - 1)
Thanks...
Try using the CATQ function:
data test;
length var1 var2 var3 $1 var4 var5 8 buffer1 buffer2 buffer3 $32767 digest1 digest2 digest3 $32;
input var1-var5;
buffer1=catx("|", of var1-var5);
digest1=put(md5(buffer1),hex32.);
buffer2=strip(var1) || "|" || strip(var2) || "|" || strip(var3) || "|" || strip(var4) || "|" || strip(var5);
digest2=put(md5(buffer2),hex32.);
buffer3 = catq("MTD", "|", of var1-var5);
digest3=put(md5(buffer3),hex32.);
datalines;
A B . 1 1
A . B 1 1
;
proc print data=test noobs; run;
PG
Hi All,
I think I've solved my own problem. I remembered a technique I first saw proposed by (a really smart guy, you all should follow him ).
I get one extra delimiter at the beginning of the buffer string, but for my purposes of creating a message digest, it meets my needs for change detection.
If you can think of a better approach, please let me know.
Thanks...
* test data, should be a different digest for each observation ;
data test;
length var1 var2 var3 $1 var4 var5 8;
input var1-var5;
datalines;
A B . 1 .
A . B 1 .
A B . . 1
A . B . 1
;
run;
* define the variables of interest. i've purposely dropped off var3 ;
%let vars=var1 var2 var4 var5;
* first test with default option for missing numerics ;
options missing=".";
data test2;
* use a "naked format" to define the PDV order ;
* the _dummy_ variable in the PDV is an end-of-processing marker ;
format &vars _dummy_;
* prevent unitialized variable notes ;
retain _dummy_ "";
set test;
length _name $32 _value buffer1 $32767 digest1 $32 buffer2 $32767 digest2 $32;
buffer1=catx("|", of var1-var5);
digest1=put(md5(buffer1),hex32.);
do while (1);
* get next variable in the PDV ;
call vnext(_name);
* bail if variable name = _dummy_ ;
if (_name="_dummy_") then leave;
* get the variable value. no notes on numeric to character conversion, which is nice ;
_value=vvaluex(_name);
buffer2=cats(buffer2,"|",_value);
end;
digest2=put(md5(buffer2),hex32.);
run;
* then do the same test as above with blank for missing numerics ;
options missing=" ";
data test3;
format &vars _dummy_;
retain _dummy_ "";
set test;
length _name $32 _value buffer1 $32767 digest1 $32 buffer2 $32767 digest2 $32;
buffer1=catx("|", of var1-var5);
digest1=put(md5(buffer1),hex32.);
do while (1);
call vnext(_name);
if (_name="_dummy_") then leave;
_value=vvaluex(_name);
buffer2=cats(buffer2,"|",_value);
end;
digest2=put(md5(buffer2),hex32.);
run;
* and if the PDV order is really critical to you, grab Richard Devenezia's (another really smart guy) %seplist macro ... (Google for his website) ;
data test4;
set test;
retain _dummy_ "";
length _name $32 _value buffer1 $32767 digest1 $32 buffer2 $32767 digest2 $32;
buffer1=catx("|", of var1-var5);
digest1=put(md5(buffer1),hex32.);
do while (1);
call vnext(_name);
if (_name="_dummy_") then leave;
if (_name in (%seplist(&vars,nest=QQ))) then do;
_value=vvaluex(_name);
buffer2=cats(buffer2,"|",_value);
end;
end;
digest2=put(md5(buffer2),hex32.);
run;
Try using the CATQ function:
data test;
length var1 var2 var3 $1 var4 var5 8 buffer1 buffer2 buffer3 $32767 digest1 digest2 digest3 $32;
input var1-var5;
buffer1=catx("|", of var1-var5);
digest1=put(md5(buffer1),hex32.);
buffer2=strip(var1) || "|" || strip(var2) || "|" || strip(var3) || "|" || strip(var4) || "|" || strip(var5);
digest2=put(md5(buffer2),hex32.);
buffer3 = catq("MTD", "|", of var1-var5);
digest3=put(md5(buffer3),hex32.);
datalines;
A B . 1 1
A . B 1 1
;
proc print data=test noobs; run;
PG
Thanks PG. I thought I knew all the CAT* functions, apparently CATQ escaped my reading of the doc :smileyblush:. Besides being less code, I'm sure it's more performant than my approach.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.