Desktop productivity for business analysts and programmers

Changing accented letters to the 'normal' 26

Reply
Occasional Contributor
Posts: 7

Changing accented letters to the 'normal' 26

I am importing a dataset with thousands of names, many which have an accented a, e, n, i, o, or u. 

In a data step, I tried to do the following

varname=translate(varname,"o","ó");
put varname;

and

varname=tranwrd(varname,"í","i");
put varname;

But neither worked. In the dataset I am still getting � as my response and I am not able to get proc freq output without exporting it to HTML or csv first. It makes analysis really clunky. 

 

I am using EG 7.13

 

Any suggestions?

PROC Star
Posts: 295

Re: Changing accented letters to the 'normal' 26

Occasional Contributor
Posts: 7

Re: Changing accented letters to the 'normal' 26

I tried it but it returns blanks instead of letters. i.e. méxico comes back as m xico.

Super User
Posts: 11,105

Re: Changing accented letters to the 'normal' 26

I would check you datasets encoding characteristic. What you are showing sounds almost like a double-byte character set.

 

Can you show us what the result of this code would be for one of your trouble values:

 

put 'Before ' varname=;

varname=translate(varname,"o","ó");
put 'After ' varname=;

 

This would let us see if the translate is even effecting the value.

 

 

Occasional Contributor
Posts: 7

Re: Changing accented letters to the 'normal' 26

The orignal database has México Mérida

 

after running 

Data test; set original;
put 'méxico' var4=;
var4=translate(var4,"e","é");
put 'mexico' var4=;
run;

the output is "Me xico Me rida" which is closer but still strange.

Super User
Posts: 9,858

Re: Changing accented letters to the 'normal' 26

calling @Patrick

Occasional Contributor
Posts: 7

Re: Changing accented letters to the 'normal' 26

I am not sure why this is working now, but it is. I didn't change any settings during proc import and did not update EG 7.13 to a new version but now accented characters are not showing as � and I am able to work with the variables?

Data test; set dataset;
var4=tranwrd(var4,"é","e");
var4=tranwrd(var4,"í","i");
var4=tranwrd(var4,"ã","a");
var4=tranwrd(var4,"ó","o");
var4=tranwrd(var4,"á","a");
var4=tranwrd(var4,"â","a");
var4=tranwrd(var4,"ñ","n");
var4=tranwrd(var4,"ú","u");
run;

Now any response in the column, whether it be México México City or Guatemala Cobán etc. changes with the code above. I must have had a coding error in previous versions, but that does not explain why in the original dataset the accented values returned as �.

Respected Advisor
Posts: 4,131

Re: Changing accented letters to the 'normal' 26

SAS Sessions run with a defined session encoding and run either a single byte or multi byte.

 

Your source data will also have an encoding and this encoding can be different from your SAS session encoding. When SAS reads your data it needs to use a translation table to map the source character encoding to the target character encoding.

 

The documentation for all of this here:

http://support.sas.com/documentation/cdl/en/nlsref/69741/HTML/default/viewer.htm#n1au6s0oh1rp4en1nbp...

 

 

Several things can go wrong:

1. There is no 1:1 character mapping possible. That can happen if you run your SAS session in single byte mode but the source is in multibyte like UTF-8 and contains multibyte encoded characters which simply can't get mapped 1:1 to a single byte representation.

SAS will throw a transcoding error in such cases.

 

2. Your source data's encoding is "misleading" and SAS assumes the wrong encoding. I've seen this happen for UTF-8 without a BOM. If there is no transcoding error then what can happen is that SAS uses the wrong character mapping and you get garbeled characters.

 

3. The last options is that the character mapping between source and target works but your client has a different encoding and then prints a different symbol. In this case everything is fine with your internally stored value and it's just about printing.

 

Given all of the above it's actually amazing that things work most of the time without us having to provide specific instructions to SAS (like parameter options for inencoding and encoding per file/table).

 

 

Thanks,

Patrick

 

 

Ask a Question
Discussion stats
  • 7 replies
  • 151 views
  • 0 likes
  • 5 in conversation