Converting accented letters

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

Converting accented letters

Hi,

Annually, I process a large dataset with 7 name fields; some of the names have accented characters, e.g., Aarón.  I posted here about 4 years ago and found the solution (thank you all!) of using the translate function to replace the accented characters with “normal” letters: 

 

 

  array seven {7} $ mfirsta mmiddlea mlasta bfirsta bmiddlea blasta dlasta;
  array eight {7} $ mfirst mmiddle mlast bfirst bmiddle blast dlast;
  do k=1 to dim(seven);
    eight{k}=upcase(translate(seven{k},"aaaaaaceeeeiiiiooooouuuun","àáâäãåçéèêëîïìíôòóõöùúûüñ"));
    end;

 

I did not run the program last year; this year it won’t work.  When SAS (this year is SAS 9; not sure what it was before) reads in the data it replaces the accented characters with little “boxes” before the translate code can take effect.  Searching the community help forums I found the tranwrd function, which helped but as a previous poster noted replaced the accented characters with a blank space.  Per other suggestions, I converted the raw data file to utf8, and eventually found that the following works:

 

 

  array seven {7} $ mfirsta mmiddlea mlasta bfirsta bmiddlea blasta dlasta;
  array eight {7} $ mfirst mmiddle mlast bfirst bmiddle blast dlast;
  ...
  array eighteen {7} $ mfirst mmiddle mlast bfirst bmiddle blast dlast;
  array nineteen {7} $ mfirst mmiddle mlast bfirst bmiddle blast dlast;
  do k=1 to dim(seven);
    eight{k}=tranwrd(seven{k},"à","a");
    nine{k}=tranwrd(eight{k},"á","a");
	...
    seventeen{k}=tranwrd(sixteen{k},"ù","u");
    eighteen{k}=tranwrd(seventeen{k},"ú","u");
    nineteen{k}=tranwrd(eighteen{k},"ñ","n");
    end;

 

But, this is pretty clunky, and only covers half of the 25 accented characters in the original code.  There must be a more streamlined method?

Thanks.

 


Accepted Solutions
Solution
‎06-07-2018 01:12 PM
Occasional Contributor
Posts: 15

Re: Converting accented letters

ktranslate worked!  (Well, once I switched the order of accented vs. non-accented back from the reversed order that tranwrd uses...)

I had seen ktranslate in the help documentation but couldn't "translate" all the details into making sense to me.

Thanks so much!

View solution in original post


All Replies
Valued Guide
Posts: 597

Re: Converting accented letters

What encoding is your SAS session and your dataset? Those little "boxes" might be because your session and dataset have different encodings. 

 

/* Find SAS session encoding */
proc options option=encoding; run;
/* or */

%PUT %SYSFUNC(getOption(ENCODING));

/* Find SAS dataset encoding */
%LET DSID=%SYSFUNC(open(sashelp.class,i)); /* sashelp.class dataset, replace with your dataset */ %PUT %SYSFUNC(ATTRC(&DSID,ENCODING));

 Converting SAS dataset encoding might help TRANSLATE to work. 

http://support.sas.com/kb/15/597.html

Thanks,
Suryakiran
PROC Star
Posts: 2,370

Re: Converting accented letters

I missed something.

If tranwrd() works after conversion then surely translate() works?

Super User
Posts: 10,787

Re: Converting accented letters

data _NULL_;
x=basechar("àáâäãåçéèêëîïìíô");
put x=;
run;
PROC Star
Posts: 2,370

Re: Converting accented letters

@Ksharp Nice one, never seen that function before.

 

 

@ChrisHemedinger Any chance you could mention to someone suitable that the online doc

 

http://documentation.sas.com/?docsetId=lefunctionsref&docsetTarget=titlepage.htm&docsetVersion=9.4&l...

 

is incomplete and misses functions like qscan() or basechar() ?

 

SAS Employee
Posts: 1

Re: Converting accented letters

Hey,

   I just wanted to supply you with the links the docs for these 2 functions.

 

%QSCAN is in the macro doc:

http://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.3&docsetId=mcrolref&docsetTarget=p0mn...

 

BASECHAR is in the National Language Support doc
http://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.3&docsetId=nlsref&docsetTarget=p078j5...

 

Here's the programming Help Center link if you'd like to start your search from here:

http://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.3&docsetId=pgmmvaov&docsetTarget=pgms...

 

And here are a couple of tips to help you find the doc.

If you're in the Functions doc, there is a topic SAS Functions and CALL Routines Documented in Other SAS Publications.

 It's at the top of the Dictionary section of the doc.

 

http://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.3&docsetId=lefunctionsref&docsetTarge...

 

And, If you haven't found it yet, the programming Help Center Syntax Quick links can get you to the list of all of the functions.

From the programming Help Center link above, select Syntax Quick Links > SAS Language Elements by Name, Product, and Category. Scroll down to Functions.

 

But, the quickest way is to search.

 

Hope this helps!

Eliz.

Occasional Contributor
Posts: 15

Re: Converting accented letters

Thank you!
Respected Advisor
Posts: 4,737

Re: Converting accented letters

[ Edited ]

@DebbiBJ

You've got code that used to work but doesn't anymore. It used to work with an older version of SAS but now with a newer one doesn't anymore.

 

@SuryaKiran's question about the session encoding points likely into the right direction. It's very possible that your newer version of SAS now uses a different encoding - eventually UTF-8 which uses more than one byte to store certain characters (and though it's called MBCS - MultiByteCharacterSet). It's also rather likely that your older SAS version used single byte encoding (SBCS).

 

If above is true then yes, TRANSLATE() won't work anymore for characters encoded with more than one Byte (MBCS). You need to use function KTRANSLATE() instead.
That things still work for you using TRANWRD() is a pointer that this is the problem as TRANWRD() works for MBCS.

 

Below link provides a table which shows you for all the string functions where you can use them (SBCS only or also MBCS).

http://documentation.sas.com/?docsetId=nlsref&docsetTarget=p1pca7vwjjwucin178l8qddjn0gi.htm&docsetVe...

 

 

Solution
‎06-07-2018 01:12 PM
Occasional Contributor
Posts: 15

Re: Converting accented letters

ktranslate worked!  (Well, once I switched the order of accented vs. non-accented back from the reversed order that tranwrd uses...)

I had seen ktranslate in the help documentation but couldn't "translate" all the details into making sense to me.

Thanks so much!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 229 views
  • 5 likes
  • 6 in conversation