BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ScottBass
Rhodochrosite | Level 12

Here is an excerpt of a case I opened with SAS TS:

I’ve got an ETL process where I’m doing change record detection, and modifying data (close old record, add new record) if a change has occurred.  It’s somewhat analogous to SCD2 processing.

I’ve ran into a problem where data was copied into the wrong column.  This error was corrected, but due to the way CATX works, a change is not detected.

An example will help to illustrate the point:

data test;

  length var1 var2 var3 $1 var4 var5 8 buffer1 buffer2 $32767 digest1 digest2 $32;

  input var1-var5;

  buffer1=catx("|", of var1-var5);

  digest1=put(md5(buffer1),hex32.);

  buffer2=strip(var1) || "|" || strip(var2) || "|" || strip(var3) || "|" || strip(var4) || "|" || strip(var5);

  digest2=put(md5(buffer2),hex32.);

   datalines;

A B . 1 1

A . B 1 1

;

run;

I'd prefer to use the CATX function - I can use a variable list, I don't have to think about data type conversion, and don’t have to jump through hoops in my code generation.  My actual code has about 40 variables I need to check for changes.

The solution, of course, is a flag for the CATX function that would mimic the concatenation operator.  IOW, print the delimiter regardless of whether the data value is missing.

I think (???) the CATX function could be changed where it would not break existing code.  Example invocations:

buffer1=catx("|", of var1-var5, “Y”);

buffer1=catx("|", var1, var2, var3, var4, var5, “Y”);

The last parameter does not have to be “Y”, it just needs to be some character string, that could not be confused with a variable, that would trigger CATX to print the delimiter regardless of whether the variable is missing or not.  There would be a delimiter for every variable – 1.

Questions:

Can you think of an ingenious way to address this problem, other than the concatenation operator as above?  Some ideas:

* Perhaps build the buffer one variable at a time, translating missing to “00”x, to “trick” CATX into printing the delimiter.  But spinning over the variable list would be problematic since it is a mix of char and num.

* Or, perhaps I can export the dataset as a delimited file, read that file back in without delimiters, and use _INFILE_ as the buffer for the MD5 function.

* Or, could I use PROC FCMP to roll my own CATX function that would give me the desired results?  (BTW, the desired results is a buffer with one delimiter per variable - 1)

Thanks...


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Try using the CATQ function:

data test;

  length var1 var2 var3 $1 var4 var5 8 buffer1 buffer2 buffer3 $32767 digest1 digest2 digest3 $32;

  input var1-var5;

  buffer1=catx("|", of var1-var5);

  digest1=put(md5(buffer1),hex32.);

  buffer2=strip(var1) || "|" || strip(var2) || "|" || strip(var3) || "|" || strip(var4) || "|" || strip(var5);

  digest2=put(md5(buffer2),hex32.);

  buffer3 = catq("MTD", "|", of var1-var5);

  digest3=put(md5(buffer3),hex32.);

datalines;

A B . 1 1

A . B 1 1

;

proc print data=test noobs; run;

PG

PG

View solution in original post

3 REPLIES 3
ScottBass
Rhodochrosite | Level 12

Hi All,

I think I've solved my own problem.  I remembered a technique I first saw proposed by (a really smart guy, you all should follow him Smiley Happy).

I get one extra delimiter at the beginning of the buffer string, but for my purposes of creating a message digest, it meets my needs for change detection.

If you can think of a better approach, please let me know.

Thanks...

* test data, should be a different digest for each observation ;

data test;

  length var1 var2 var3 $1 var4 var5 8;

  input var1-var5;

  datalines;

A B . 1 .

A . B 1 .

A B . . 1

A . B . 1

;

run;

* define the variables of interest.  i've purposely dropped off var3 ;

%let vars=var1 var2 var4 var5;

* first test with default option for missing numerics ;

options missing=".";

data test2;

  * use a "naked format" to define the PDV order ;

  * the _dummy_ variable in the PDV is an end-of-processing marker ;

  format &vars _dummy_;

  * prevent unitialized variable notes ;

  retain _dummy_ "";

  set test;

  length _name $32 _value buffer1 $32767 digest1 $32 buffer2 $32767 digest2 $32;

  buffer1=catx("|", of var1-var5);

digest1=put(md5(buffer1),hex32.);

  do while (1);

    * get next variable in the PDV ;

    call vnext(_name);

    * bail if variable name = _dummy_ ;

    if (_name="_dummy_") then leave;

    * get the variable value.  no notes on numeric to character conversion, which is nice ;

_value=vvaluex(_name);

buffer2=cats(buffer2,"|",_value);

  end;

digest2=put(md5(buffer2),hex32.);

run;

* then do the same test as above with blank for missing numerics ;

options missing=" ";

data test3;

  format &vars _dummy_;

  retain _dummy_ "";

  set test;

  length _name $32 _value buffer1 $32767 digest1 $32 buffer2 $32767 digest2 $32;

buffer1=catx("|", of var1-var5);

digest1=put(md5(buffer1),hex32.);

  do while (1);

    call vnext(_name);

    if (_name="_dummy_") then leave;

_value=vvaluex(_name);

buffer2=cats(buffer2,"|",_value);

  end;

digest2=put(md5(buffer2),hex32.);

run;

* and if the PDV order is really critical to you, grab Richard Devenezia's (another really smart guy) %seplist macro ... (Google for his website) ;

data test4;

  set test;

  retain _dummy_ "";

  length _name $32 _value buffer1 $32767 digest1 $32 buffer2 $32767 digest2 $32;

buffer1=catx("|", of var1-var5);

digest1=put(md5(buffer1),hex32.);

  do while (1);

    call vnext(_name);

    if (_name="_dummy_") then leave;

    if (_name in (%seplist(&vars,nest=QQ))) then do;

_value=vvaluex(_name);

buffer2=cats(buffer2,"|",_value);

    end;

  end;

digest2=put(md5(buffer2),hex32.);

run;


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
PGStats
Opal | Level 21

Try using the CATQ function:

data test;

  length var1 var2 var3 $1 var4 var5 8 buffer1 buffer2 buffer3 $32767 digest1 digest2 digest3 $32;

  input var1-var5;

  buffer1=catx("|", of var1-var5);

  digest1=put(md5(buffer1),hex32.);

  buffer2=strip(var1) || "|" || strip(var2) || "|" || strip(var3) || "|" || strip(var4) || "|" || strip(var5);

  digest2=put(md5(buffer2),hex32.);

  buffer3 = catq("MTD", "|", of var1-var5);

  digest3=put(md5(buffer3),hex32.);

datalines;

A B . 1 1

A . B 1 1

;

proc print data=test noobs; run;

PG

PG
ScottBass
Rhodochrosite | Level 12

Thanks PG.  I thought I knew all the CAT* functions, apparently CATQ escaped my reading of the doc :smileyblush:.  Besides being less code, I'm sure it's more performant than my approach.


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 922 views
  • 2 likes
  • 2 in conversation