BookmarkSubscribeRSS Feed
HeatherNewton
Quartz | Level 8
data test2;
 set cust;
 string=COLLATE(1,256);
 high4=substr(string,55);
 cust_id=translate(cust_id, high4, string);
run;

we did some sort of translation, can anyone let me know how I can reverse it and get cust_id back to original form ?

14 REPLIES 14
Quentin
Super User

Translate is swapping characters, so I would think:

 string=COLLATE(1,256);
 high4=substr(string,55);
 cust_id=translate(cust_id, string, high4);

?

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
Tom
Super User Tom
Super User

@HeatherNewton wrote:
data test2;
 set cust;
 string=COLLATE(1,256);
 high4=substr(string,55);
 cust_id=translate(cust_id, high4, string);
run;

we did some sort of translation, can anyone let me know how I can reverse it and get cust_id back to original form ?


Just swap the two strings in the TRANSLATE() function call.

Example:

 

1    data test;
2     cust_id='1234567890';
3     string=COLLATE(1,256);
4     high4=substr(string,55);
5     put 'BEFORE: ' cust_id $char10. +1 cust_id $hex20.;
6     cust_id=translate(cust_id, high4, string);
7     put 'AFTER : ' cust_id $char10. +1 cust_id $hex20.;
8     cust_id=translate(cust_id, string, high4);
9     put 'UNDO  : ' cust_id $char10. +1 cust_id $hex20.;
10   run;

BEFORE: 1234567890 31323334353637383930
AFTER : ghijklmnof 6768696A6B6C6D6E6F66
UNDO  : 1234567890 31323334353637383930

But if CUST_ID had any characters in those last 55 characters of STRING then you cannot undo the change.

Let's simulate that with our little 10 character test.

1    data test;
2     cust_id='1234567890';
3     string=cust_id;
4     high4=substr(string,5);
5     put 'BEFORE: ' cust_id $char10. +1 cust_id $hex20.;
6     cust_id=translate(cust_id, high4, string);
7     put 'AFTER : ' cust_id $char10. +1 cust_id $hex20.;
8     cust_id=translate(cust_id, string, high4);
9     put 'UNDO  : ' cust_id $char10. +1 cust_id $hex20.;
10   run;

BEFORE: 1234567890 31323334353637383930
AFTER : 567890     35363738393020202020
UNDO  : 1234560000 31323334353630303030

So '1' is translated to '5' and '6' to '0', but '7' thru '0' are all translated to space so there is no way to tell those spaces apart.  To fix that add the skipped values back to the end of HIGH4 in the original step.  Then the translation is completely reversible.

1    data test;
2     cust_id='1234567890';
3     string=cust_id;
4     high4=substr(string,5)||substr(string,1,4);
5     put 'BEFORE: ' cust_id $char10. +1 cust_id $hex20.;
6     cust_id=translate(cust_id, high4, string);
7     put 'AFTER : ' cust_id $char10. +1 cust_id $hex20.;
8     cust_id=translate(cust_id, string, high4);
9     put 'UNDO  : ' cust_id $char10. +1 cust_id $hex20.;
10   run;

BEFORE: 1234567890 31323334353637383930
AFTER : 5678901234 35363738393031323334
UNDO  : 1234567890 31323334353637383930

 

 

 

ballardw
Super User

IF any of the original transforms changed multiple characters to a single character, such as

 

Translate (somevar,'111','abc')

which would map a, b and c characters to 1 you cannot be guaranteed to recover the original.

data example;
   longstring = 'asdqlfkjbca';
   otherstring=translate(longstring,'111','abc');
   trystring= translate(otherstring,'abc','111');
run;

Note that all the abc translated to 1 in Otherstring become c in Trystring.

So without explicit values of your translation strings I would say the general answer to the question is indeterminate.

Recover from back up data.

If  you replace the original data set with the modified versions then reread the data or start from a data set prior to the translate.

 

 

Additionally, if any of your original strings contained duplicate characters then it is double hard to get back as the same multiple assigned "to" characters in translate would have been ignored and grouped into a single output character.

HeatherNewton
Quartz | Level 8
What if cust_id contains also characters
E.g.
E4349567
C4127658
C6267260
E5211402
K4729617
G6646262

Does it work?
Tom
Super User Tom
Super User

Try it.

 

Should work as long as none of the characters fall into the set of 54 left out of the target values for the TRANSLATE() function.

Which variables are excluded depend on whether you set a length for the variable you stored COLLATE() into since SAS will default to $200 instead of $256 if the variable's wasn't defined first.

96   data _null_;
97
98    string=collate(1,256);
99    vlen=vlength(string);
100   put vlen=;
101   end = substr(string,vlen-54);
102   put end=;
103  run;

vlen=200
end=’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈ
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


104  data _null_;
105   length string $256;
106   string=collate(1,256);
107   vlen=vlength(string);
108   put vlen=;
109   end = substr(string,vlen-54);
110   put end=;
111  run;

vlen=256
end=ÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ

 

HeatherNewton
Quartz | Level 8
 string=COLLATE(1,256);
 high4=substr(string,55);
 cust_id=translate(cust_id, string, high4);

Actually there is really just three lines in the original code and there is no specification to length or format or data type to string or high4 and cust_id is an alphabet follow by 7 digits no of 0-9, would a one liner cust_id=translate(cust_id, high4, string) be adequate or I have to add string=collate(1,256); high4=substr((string,55); just before that one liner?
My converted cust_id does look more strange than what we see here sometimes longer with say a bunch of VVVVVVVVVVVVVVVVVV to end some usually about some 40 characters long..

Tom
Super User Tom
Super User

Why not just test it?

We could make a string that has digits and letters and see what it does to them and whether it can be reversed.

1    data _null_;
2     custid1='09AZaz';
3     string=COLLATE(1,256);
4     high4=substr(string,55);
5     custid2=translate(custid1, high4,string);
6     custid3=translate(custid2, string, high4);
7     put (custid:) (=/);
8     put (custid:) (=$hex./);
9    run;


custid1=09AZaz
custid2=fow—°
custid3=09AZaz
custid1=3039415A617A
custid2=666F779097B0
custid3=3039415A617A

So it works when running on the same SAS session. 

 

But you might have trouble if you use the translated string with a different encoding.  Notice how the lowercase letters and even the uppercase Z were mapped to characters beyond the range of normal ASCII characters ('7F'x).   A was mapped to '77'x which is a lowercase w, but Zaz was mapped to '9097B0'x, which in my encoding shows as a graphic line drawing character and degree symbol, but in this forum it looks like something else.

Tom_0-1699970111799.pngTom_1-1699970142761.png

 

 

So if the transformed characters were transcoded into some different encoding they might be mapped to some other characters.  So before trying to reverse the original logic you would need to first undo the effect of the transcoding.

HeatherNewton
Quartz | Level 8
I cannot have the translation to and reverse running on same session. What extra do i need to reverse?
Tom
Super User Tom
Super User

@HeatherNewton wrote:
I cannot have the translation to and reverse running on same session. What extra do i need to reverse?

Just make sure you are using the same encoding. Preferable the same SAS version.

 

You might want to read the file in with the ENCODING=ANY dataset option just be sure that SAS does not transcode the values before you have a chance to run the TRANSLATE() function on them.

Patrick
Opal | Level 21

@HeatherNewton You've asked the same question in two places. Repeating my answer given here with a few additions.

To avoid that several different characters can get translated into a single character you need to ensure that the from and to strings in the translate function contain the same number of characters - and that they are all different. Below amendment to your code does this.

data custinfo;
  input cust_id $8.;
  datalines;
E4349567
C4127658
C6267260
E5211402 
K4729617
G6646262
;
run;

data test1;
  set custinfo;
  length string high4 $256;
  retain string high4;
  if _n_=1 then
    do;
      string=COLLATE(0,255);
      high4=cats(substr(string,56),substr(string,1,55));
    end;
  cust_id_out_1=translate(cust_id, high4, string);
  cust_id_out_2=translate(cust_id_out_1,string,high4);
  format string $hex512. high4 $hex110. cust_id_out_1 $hex16.;
run;

proc print data=test1;
  var cust_id cust_id_out_1 cust_id_out_2;
run;

Patrick_0-1699919846801.png

 

Btw: If you really want to create and store a list of the first 256 characters in the collating sequence then you need to explicitly define the length for variable string as else it will default to a length of only $200 in your code. 

You also need to start counting at 0 and not 1.

And... if you want to ensure consistent results for running in any SAS environment then you need also to care about the Collating Sequence and define it explicitly.

I'd also consider to create a list of characters for variable string that only contains printable characters and that certainly don't contain control characters.

HeatherNewton
Quartz | Level 8

if I already have cust_id_out_1 created from cust_id and I just need to turn cust_id_out_1 back into cust_id

how do I put it?

 

I tried the given command starting from cust_id -e1234567 etc to cust_id_out_1 to cust_id_out_2 it worked

but if I already have cust_id_out_1 and I want to get back to cust_id, the translation gives wrong answer

What is missing?

Patrick
Opal | Level 21

@HeatherNewton wrote:

if I already have cust_id_out_1 created from cust_id and I just need to turn cust_id_out_1 back into cust_id

how do I put it?

 

I tried the given command starting from cust_id -e1234567 etc to cust_id_out_1 to cust_id_out_2 it worked

but if I already have cust_id_out_1 and I want to get back to cust_id, the translation gives wrong answer

What is missing?


You need to create and populate your variables high4 and string exactly the way as you've done it when you scrambled the source string.

And as other's already stated: If this initial code that "masked" the source string at any time translated more than one clear text character to the same "masked" character then the process is not reversible. Having seen your code and if you've got only alphanumeric characters in your clear text string and assuming you're running this in the same environment, I'd assume you won't have such issues.

 

 

HeatherNewton
Quartz | Level 8
 if _n_=1 then
    do;
      string=COLLATE(0,255);
      high4=cats(substr(string,56),substr(string,1,55));
    end;

why is this part required? also why have to minus 1 in collate paramenters?

 

Kurt_Bremser
Super User

Bytes contain characters from 0 (binary 00000000) to 255 (binary 11111111).

To build a complete ASCII collating sequence, you have to go from 0 to 255 (there is no character 256!).

By attaching the first 55 characters to the end of the masking string, you make sure that any characters in the range 200 to 255 are also masked correctly. Without that, they would all be masked with blanks.

Such characters (200 and above) appear in the ISO 8859 extended ASCII table.

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 3772 views
  • 1 like
  • 6 in conversation