data test2;
set cust;
string=COLLATE(1,256);
high4=substr(string,55);
cust_id=translate(cust_id, high4, string);
run;
we did some sort of translation, can anyone let me know how I can reverse it and get cust_id back to original form ?
Translate is swapping characters, so I would think:
string=COLLATE(1,256);
high4=substr(string,55);
cust_id=translate(cust_id, string, high4);
?
@HeatherNewton wrote:
data test2; set cust; string=COLLATE(1,256); high4=substr(string,55); cust_id=translate(cust_id, high4, string); run;
we did some sort of translation, can anyone let me know how I can reverse it and get cust_id back to original form ?
Just swap the two strings in the TRANSLATE() function call.
Example:
1 data test; 2 cust_id='1234567890'; 3 string=COLLATE(1,256); 4 high4=substr(string,55); 5 put 'BEFORE: ' cust_id $char10. +1 cust_id $hex20.; 6 cust_id=translate(cust_id, high4, string); 7 put 'AFTER : ' cust_id $char10. +1 cust_id $hex20.; 8 cust_id=translate(cust_id, string, high4); 9 put 'UNDO : ' cust_id $char10. +1 cust_id $hex20.; 10 run; BEFORE: 1234567890 31323334353637383930 AFTER : ghijklmnof 6768696A6B6C6D6E6F66 UNDO : 1234567890 31323334353637383930
But if CUST_ID had any characters in those last 55 characters of STRING then you cannot undo the change.
Let's simulate that with our little 10 character test.
1 data test; 2 cust_id='1234567890'; 3 string=cust_id; 4 high4=substr(string,5); 5 put 'BEFORE: ' cust_id $char10. +1 cust_id $hex20.; 6 cust_id=translate(cust_id, high4, string); 7 put 'AFTER : ' cust_id $char10. +1 cust_id $hex20.; 8 cust_id=translate(cust_id, string, high4); 9 put 'UNDO : ' cust_id $char10. +1 cust_id $hex20.; 10 run; BEFORE: 1234567890 31323334353637383930 AFTER : 567890 35363738393020202020 UNDO : 1234560000 31323334353630303030
So '1' is translated to '5' and '6' to '0', but '7' thru '0' are all translated to space so there is no way to tell those spaces apart. To fix that add the skipped values back to the end of HIGH4 in the original step. Then the translation is completely reversible.
1 data test; 2 cust_id='1234567890'; 3 string=cust_id; 4 high4=substr(string,5)||substr(string,1,4); 5 put 'BEFORE: ' cust_id $char10. +1 cust_id $hex20.; 6 cust_id=translate(cust_id, high4, string); 7 put 'AFTER : ' cust_id $char10. +1 cust_id $hex20.; 8 cust_id=translate(cust_id, string, high4); 9 put 'UNDO : ' cust_id $char10. +1 cust_id $hex20.; 10 run; BEFORE: 1234567890 31323334353637383930 AFTER : 5678901234 35363738393031323334 UNDO : 1234567890 31323334353637383930
IF any of the original transforms changed multiple characters to a single character, such as
Translate (somevar,'111','abc')
which would map a, b and c characters to 1 you cannot be guaranteed to recover the original.
data example; longstring = 'asdqlfkjbca'; otherstring=translate(longstring,'111','abc'); trystring= translate(otherstring,'abc','111'); run;
Note that all the abc translated to 1 in Otherstring become c in Trystring.
So without explicit values of your translation strings I would say the general answer to the question is indeterminate.
Recover from back up data.
If you replace the original data set with the modified versions then reread the data or start from a data set prior to the translate.
Additionally, if any of your original strings contained duplicate characters then it is double hard to get back as the same multiple assigned "to" characters in translate would have been ignored and grouped into a single output character.
Try it.
Should work as long as none of the characters fall into the set of 54 left out of the target values for the TRANSLATE() function.
Which variables are excluded depend on whether you set a length for the variable you stored COLLATE() into since SAS will default to $200 instead of $256 if the variable's wasn't defined first.
96 data _null_; 97 98 string=collate(1,256); 99 vlen=vlength(string); 100 put vlen=; 101 end = substr(string,vlen-54); 102 put end=; 103 run; vlen=200 end=’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈ NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 104 data _null_; 105 length string $256; 106 string=collate(1,256); 107 vlen=vlength(string); 108 put vlen=; 109 end = substr(string,vlen-54); 110 put end=; 111 run; vlen=256 end=ÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
string=COLLATE(1,256);
high4=substr(string,55);
cust_id=translate(cust_id, string, high4);
Actually there is really just three lines in the original code and there is no specification to length or format or data type to string or high4 and cust_id is an alphabet follow by 7 digits no of 0-9, would a one liner cust_id=translate(cust_id, high4, string) be adequate or I have to add string=collate(1,256); high4=substr((string,55); just before that one liner?
My converted cust_id does look more strange than what we see here sometimes longer with say a bunch of VVVVVVVVVVVVVVVVVV to end some usually about some 40 characters long..
Why not just test it?
We could make a string that has digits and letters and see what it does to them and whether it can be reversed.
1 data _null_; 2 custid1='09AZaz'; 3 string=COLLATE(1,256); 4 high4=substr(string,55); 5 custid2=translate(custid1, high4,string); 6 custid3=translate(custid2, string, high4); 7 put (custid:) (=/); 8 put (custid:) (=$hex./); 9 run; custid1=09AZaz custid2=fow—° custid3=09AZaz custid1=3039415A617A custid2=666F779097B0 custid3=3039415A617A
So it works when running on the same SAS session.
But you might have trouble if you use the translated string with a different encoding. Notice how the lowercase letters and even the uppercase Z were mapped to characters beyond the range of normal ASCII characters ('7F'x). A was mapped to '77'x which is a lowercase w, but Zaz was mapped to '9097B0'x, which in my encoding shows as a graphic line drawing character and degree symbol, but in this forum it looks like something else.
So if the transformed characters were transcoded into some different encoding they might be mapped to some other characters. So before trying to reverse the original logic you would need to first undo the effect of the transcoding.
@HeatherNewton wrote:
I cannot have the translation to and reverse running on same session. What extra do i need to reverse?
Just make sure you are using the same encoding. Preferable the same SAS version.
You might want to read the file in with the ENCODING=ANY dataset option just be sure that SAS does not transcode the values before you have a chance to run the TRANSLATE() function on them.
@HeatherNewton You've asked the same question in two places. Repeating my answer given here with a few additions.
To avoid that several different characters can get translated into a single character you need to ensure that the from and to strings in the translate function contain the same number of characters - and that they are all different. Below amendment to your code does this.
data custinfo;
input cust_id $8.;
datalines;
E4349567
C4127658
C6267260
E5211402
K4729617
G6646262
;
run;
data test1;
set custinfo;
length string high4 $256;
retain string high4;
if _n_=1 then
do;
string=COLLATE(0,255);
high4=cats(substr(string,56),substr(string,1,55));
end;
cust_id_out_1=translate(cust_id, high4, string);
cust_id_out_2=translate(cust_id_out_1,string,high4);
format string $hex512. high4 $hex110. cust_id_out_1 $hex16.;
run;
proc print data=test1;
var cust_id cust_id_out_1 cust_id_out_2;
run;
Btw: If you really want to create and store a list of the first 256 characters in the collating sequence then you need to explicitly define the length for variable string as else it will default to a length of only $200 in your code.
You also need to start counting at 0 and not 1.
And... if you want to ensure consistent results for running in any SAS environment then you need also to care about the Collating Sequence and define it explicitly.
I'd also consider to create a list of characters for variable string that only contains printable characters and that certainly don't contain control characters.
if I already have cust_id_out_1 created from cust_id and I just need to turn cust_id_out_1 back into cust_id
how do I put it?
I tried the given command starting from cust_id -e1234567 etc to cust_id_out_1 to cust_id_out_2 it worked
but if I already have cust_id_out_1 and I want to get back to cust_id, the translation gives wrong answer
What is missing?
@HeatherNewton wrote:
if I already have cust_id_out_1 created from cust_id and I just need to turn cust_id_out_1 back into cust_id
how do I put it?
I tried the given command starting from cust_id -e1234567 etc to cust_id_out_1 to cust_id_out_2 it worked
but if I already have cust_id_out_1 and I want to get back to cust_id, the translation gives wrong answer
What is missing?
You need to create and populate your variables high4 and string exactly the way as you've done it when you scrambled the source string.
And as other's already stated: If this initial code that "masked" the source string at any time translated more than one clear text character to the same "masked" character then the process is not reversible. Having seen your code and if you've got only alphanumeric characters in your clear text string and assuming you're running this in the same environment, I'd assume you won't have such issues.
if _n_=1 then do; string=COLLATE(0,255); high4=cats(substr(string,56),substr(string,1,55)); end;
why is this part required? also why have to minus 1 in collate paramenters?
Bytes contain characters from 0 (binary 00000000) to 255 (binary 11111111).
To build a complete ASCII collating sequence, you have to go from 0 to 255 (there is no character 256!).
By attaching the first 55 characters to the end of the masking string, you make sure that any characters in the range 200 to 255 are also masked correctly. Without that, they would all be masked with blanks.
Such characters (200 and above) appear in the ISO 8859 extended ASCII table.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.