04-05-2017 05:33 PM
I am importing a dataset with thousands of names, many which have an accented a, e, n, i, o, or u.
In a data step, I tried to do the following
varname=translate(varname,"o","ó"); put varname; and varname=tranwrd(varname,"í","i"); put varname;
But neither worked. In the dataset I am still getting � as my response and I am not able to get proc freq output without exporting it to HTML or csv first. It makes analysis really clunky.
I am using EG 7.13
04-05-2017 07:07 PM
I would check you datasets encoding characteristic. What you are showing sounds almost like a double-byte character set.
Can you show us what the result of this code would be for one of your trouble values:
put 'Before ' varname=;
put 'After ' varname=;
This would let us see if the translate is even effecting the value.
04-06-2017 02:49 PM
The orignal database has México Mérida
Data test; set original; put 'méxico' var4=; var4=translate(var4,"e","é"); put 'mexico' var4=; run;
the output is "Me xico Me rida" which is closer but still strange.
04-06-2017 02:59 PM
I am not sure why this is working now, but it is. I didn't change any settings during proc import and did not update EG 7.13 to a new version but now accented characters are not showing as � and I am able to work with the variables?
Data test; set dataset; var4=tranwrd(var4,"é","e"); var4=tranwrd(var4,"í","i"); var4=tranwrd(var4,"ã","a"); var4=tranwrd(var4,"ó","o"); var4=tranwrd(var4,"á","a"); var4=tranwrd(var4,"â","a"); var4=tranwrd(var4,"ñ","n"); var4=tranwrd(var4,"ú","u"); run;
Now any response in the column, whether it be México México City or Guatemala Cobán etc. changes with the code above. I must have had a coding error in previous versions, but that does not explain why in the original dataset the accented values returned as �.
04-06-2017 06:46 PM
SAS Sessions run with a defined session encoding and run either a single byte or multi byte.
Your source data will also have an encoding and this encoding can be different from your SAS session encoding. When SAS reads your data it needs to use a translation table to map the source character encoding to the target character encoding.
The documentation for all of this here:
Several things can go wrong:
1. There is no 1:1 character mapping possible. That can happen if you run your SAS session in single byte mode but the source is in multibyte like UTF-8 and contains multibyte encoded characters which simply can't get mapped 1:1 to a single byte representation.
SAS will throw a transcoding error in such cases.
2. Your source data's encoding is "misleading" and SAS assumes the wrong encoding. I've seen this happen for UTF-8 without a BOM. If there is no transcoding error then what can happen is that SAS uses the wrong character mapping and you get garbeled characters.
3. The last options is that the character mapping between source and target works but your client has a different encoding and then prints a different symbol. In this case everything is fine with your internally stored value and it's just about printing.
Given all of the above it's actually amazing that things work most of the time without us having to provide specific instructions to SAS (like parameter options for inencoding and encoding per file/table).