TRANSLATE() works on single bytes. If you are using ENCODING=UTF-8 then some of the "characters" in your string will be multiple bytes long. That is going to cause all kinds of crazy to happen.
Consider just two of those characters. Let's make a little test, Let's but the FROM and TO strings into their own variables so we can get a look at what they contain.
73 data test;
74 String = "XáäčY";
75 To = "aa" ;
76 From = "áä" ;
77 Want = String;
78 Want = TRANSLATE(string,to,from);
79 put (string -- want) (=$quote.);
80 put (string -- want) (=$hex.);
81 run;
String="XáäčY" To="aa" From="áä" Want="Xaaa čY"
String=58C3A1C3A4C48D59 To=6161 From=C3A1C3A4 Want=5861616120C48D59
NOTE: The data set WORK.TEST has 1 observations and 4 variables.
Notice that the FROM string has 4 bytes and the TO string only has 2 bytes. TRANSLATE() will pad the TO string with spaces ('20'x) to make them the same length. So you are telling TRANSLATE to perform the following replacements:
To=6161 From=C3A1C3A4
C3 -> 61
A1 -> 61
C3 -> 20
A4 -> 20
Notice that you gave conflicting instructions on how to translate the 'C3'x bytes. First said make it an a and then you said make it a space.
Let's look at the result and see which one it decided to map that byte to.
String=58C3A1C3A4C48D59
Want =5861616120C48D59
So C3 was mapped to 61 (the letter a) and A1 was also mapped to the letter a.
And A4 was mapped to a space.
So TRANSLATE() uses the FIRST value you ask it to translate into when you have the same byte multiple times in the FROM list of bytes.
If you want to translate characters instead of bytes then use the KTRANSLATE() function.
KTRANSLATE(FirstNAME, "aaccdeeillnnoorrsstuyzz", "áäčćďéěíĺľňńóôŕřšśťúýžź")
If we use the same test program with KTRANSLATE() instead this is the result:
String="XáäčY" To="aa" From="áä" Want="XaačY"
String=58C3A1C3A4C48D59 To=6161 From=C3A1C3A4 Want=586161C48D592020
Notice the two extra spaces on the end of WANT. That is because WANT was defined long enough to store STRING. And after replacing two characters that used 2 bytes each with a character that needs only one byte the resulting string is 2 bytes shorter.
... View more