SAS Programming

ScottBass · Posted 03-10-2014 07:56 PM

Here is an excerpt of a case I opened with SAS TS:

I’ve got an ETL process where I’m doing change record detection, and modifying data (close old record, add new record) if a change has occurred. It’s somewhat analogous to SCD2 processing.

I’ve ran into a problem where data was copied into the wrong column. This error was corrected, but due to the way CATX works, a change is not detected.

An example will help to illustrate the point:

data test;

length var1 var2 var3 $1 var4 var5 8 buffer1 buffer2 $32767 digest1 digest2 $32;

input var1-var5;

buffer1=catx("|", of var1-var5);

digest1=put(md5(buffer1),hex32.);

buffer2=strip(var1) || "|" || strip(var2) || "|" || strip(var3) || "|" || strip(var4) || "|" || strip(var5);

digest2=put(md5(buffer2),hex32.);

datalines;

A B . 1 1

A . B 1 1

;

run;

I'd prefer to use the CATX function - I can use a variable list, I don't have to think about data type conversion, and don’t have to jump through hoops in my code generation. My actual code has about 40 variables I need to check for changes.

The solution, of course, is a flag for the CATX function that would mimic the concatenation operator. IOW, print the delimiter regardless of whether the data value is missing.

I think (???) the CATX function could be changed where it would not break existing code. Example invocations:

buffer1=catx("|", of var1-var5, “Y”);

buffer1=catx("|", var1, var2, var3, var4, var5, “Y”);

The last parameter does not have to be “Y”, it just needs to be some character string, that could not be confused with a variable, that would trigger CATX to print the delimiter regardless of whether the variable is missing or not. There would be a delimiter for every variable – 1.

Questions:

Can you think of an ingenious way to address this problem, other than the concatenation operator as above? Some ideas:

* Perhaps build the buffer one variable at a time, translating missing to “00”x, to “trick” CATX into printing the delimiter. But spinning over the variable list would be problematic since it is a mix of char and num.

* Or, perhaps I can export the dataset as a delimited file, read that file back in without delimiters, and use _INFILE_ as the buffer for the MD5 function.

* Or, could I use PROC FCMP to roll my own CATX function that would give me the desired results? (BTW, the desired results is a buffer with one delimiter per variable - 1)

Thanks...

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

PGStats · Posted 03-10-2014 09:57 PM

Try using the CATQ function:

data test;

length var1 var2 var3 $1 var4 var5 8 buffer1 buffer2 buffer3 $32767 digest1 digest2 digest3 $32;

input var1-var5;

buffer1=catx("|", of var1-var5);

digest1=put(md5(buffer1),hex32.);

buffer2=strip(var1) || "|" || strip(var2) || "|" || strip(var3) || "|" || strip(var4) || "|" || strip(var5);

digest2=put(md5(buffer2),hex32.);

buffer3 = catq("MTD", "|", of var1-var5);

digest3=put(md5(buffer3),hex32.);

datalines;

A B . 1 1

A . B 1 1

;

proc print data=test noobs; run;

PG

View solution in original post

ScottBass · Posted 03-10-2014 09:49 PM

Hi All,

I think I've solved my own problem. I remembered a technique I first saw proposed by (a really smart guy, you all should follow him ).

I get one extra delimiter at the beginning of the buffer string, but for my purposes of creating a message digest, it meets my needs for change detection.

If you can think of a better approach, please let me know.

Thanks...

* test data, should be a different digest for each observation ;

data test;

length var1 var2 var3 $1 var4 var5 8;

input var1-var5;

datalines;

A B . 1 .

A . B 1 .

A B . . 1

A . B . 1

;

run;

* define the variables of interest. i've purposely dropped off var3 ;

%let vars=var1 var2 var4 var5;

* first test with default option for missing numerics ;

options missing=".";

data test2;

* use a "naked format" to define the PDV order ;

* the _dummy_ variable in the PDV is an end-of-processing marker ;

format &vars _dummy_;

* prevent unitialized variable notes ;

retain _dummy_ "";

set test;

length _name $32 _value buffer1 $32767 digest1 $32 buffer2 $32767 digest2 $32;

buffer1=catx("|", of var1-var5);

digest1=put(md5(buffer1),hex32.);

do while (1);

* get next variable in the PDV ;

call vnext(_name);

* bail if variable name = _dummy_ ;

if (_name="_dummy_") then leave;

* get the variable value. no notes on numeric to character conversion, which is nice ;

_value=vvaluex(_name);

buffer2=cats(buffer2,"|",_value);

end;

digest2=put(md5(buffer2),hex32.);

run;

* then do the same test as above with blank for missing numerics ;

options missing=" ";

data test3;

format &vars _dummy_;

retain _dummy_ "";

set test;

length _name $32 _value buffer1 $32767 digest1 $32 buffer2 $32767 digest2 $32;

buffer1=catx("|", of var1-var5);

digest1=put(md5(buffer1),hex32.);

do while (1);

call vnext(_name);

if (_name="_dummy_") then leave;

_value=vvaluex(_name);

buffer2=cats(buffer2,"|",_value);

end;

digest2=put(md5(buffer2),hex32.);

run;

* and if the PDV order is really critical to you, grab Richard Devenezia's (another really smart guy) %seplist macro ... (Google for his website) ;

data test4;

set test;

retain _dummy_ "";

length _name $32 _value buffer1 $32767 digest1 $32 buffer2 $32767 digest2 $32;

buffer1=catx("|", of var1-var5);

digest1=put(md5(buffer1),hex32.);

do while (1);

call vnext(_name);

if (_name="_dummy_") then leave;

if (_name in (%seplist(&vars,nest=QQ))) then do;

_value=vvaluex(_name);

buffer2=cats(buffer2,"|",_value);

end;

digest2=put(md5(buffer2),hex32.);

run;

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

PGStats · Posted 03-10-2014 09:57 PM

Try using the CATQ function:

data test;

length var1 var2 var3 $1 var4 var5 8 buffer1 buffer2 buffer3 $32767 digest1 digest2 digest3 $32;

input var1-var5;

buffer1=catx("|", of var1-var5);

digest1=put(md5(buffer1),hex32.);

buffer2=strip(var1) || "|" || strip(var2) || "|" || strip(var3) || "|" || strip(var4) || "|" || strip(var5);

digest2=put(md5(buffer2),hex32.);

buffer3 = catq("MTD", "|", of var1-var5);

digest3=put(md5(buffer3),hex32.);

datalines;

A B . 1 1

A . B 1 1

;

proc print data=test noobs; run;

PG

ScottBass · Posted 03-10-2014 11:10 PM

Thanks PG. I thought I knew all the CAT* functions, apparently CATQ escaped my reading of the doc :smileyblush:. Besides being less code, I'm sure it's more performant than my approach.

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

SAS Programming

Enhancement Request for CATX, plus some questions

Re: Enhancement Request for CATX, plus some questions

Re: Enhancement Request for CATX, plus some questions

Re: Enhancement Request for CATX, plus some questions

Re: Enhancement Request for CATX, plus some questions

Enhancing Rowing Performance Through Data: Insights from Okeanos' SAS ...

Introducing SAS Compute Server Enhancements

VHV - Enhanced risk modeling

Enhancing PowerPoint Presentations with SAS for Microsoft 365

Using SAS with SingleStore – Enhancing Performance with Aggregate Push...

Follow Us

What is...

SAS Programming

Join us for our biggest event of the year!

SAS Training: Just a Click Away

Follow Us

What is...