BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Alexxxxxxx
Pyrite | Level 9

Dear all,

 

is there any code to replace the accented characters with non-accented characters in the variable? for example, u umlaut becomes 'ue', or 'é' becomes 'e'.

 

thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

This would be a bit more efficient:

 

data want;
set have;
c = tranwrd(c, 'ß', 'SS');
c = prxChange("s/([ÄÆÖÜ])/\1E/o", -1, c);
c = prxChange("s/([äæöü])/\1e/o", -1, c);
*c = basechar(c);
c = translate(c,
    'AAAAAAACEEEEIIIIDNOOOOOOUUUUYaaaaaaaceeeeiiiionoooooouuuuyy',
    'ÀÁÂÃÅÄÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕØÖÙÚÛÜÝàáâãåäæçèéêëìíîïðñòóôõøöùúûüýÿ');
run;
PG

View solution in original post

12 REPLIES 12
ballardw
Super User

If specific letters are the concern TRANSLATE will work:

data example;
   x='abcdé';
   y=translate(x,'e','é');
run;

If you don't know of all the likely culprits then BASECHAR may work but I'm not sure of an umlaut to 2-character as you desire.

 

 

FreelanceReinh
Jade | Level 19

Hi @Alexxxxxxx,

 

For the 2-character replacements you can use TRANWRD. Unlike TRANSLATE it doesn't allow for multiple "from-to" pairs in the same function call, so you may want to use a loop.

 

Example:

data have;
length c $20;
input c;
cards;
Ägypten
Österreich
äußerst
Übung
müßig
Römer
;

data want;
array f[7] $1 _temporary_ ('ä'  'ö'  'ü'  'ß'  'Ä'  'Ö'  'Ü' );
array t[7] $2 _temporary_ ('ae' 'oe' 'ue' 'ss' 'Ae' 'Oe' 'Ue');
set have;
do _n_=1 to dim(f);
  c=tranwrd(c, f[_n_], t[_n_]);
end;
run;

Make sure that the length of the target variable is sufficient to accommodate the strings after the replacement(s).

Alexxxxxxx
Pyrite | Level 9

Dear FreelanceReinhard,

thanks for your helpful advice.

Can I use this code to run 1-character replacement as well? for example 'à' becomes 'a'?

FreelanceReinh
Jade | Level 19

@Alexxxxxxx wrote:

Dear FreelanceReinhard,

thanks for your helpful advice.

Can I use this code to run 1-character replacement as well? for example 'à' becomes 'a'?


Of course, the target values in TRANWRD can be single characters as well, but if you add them to the existing $2 array, you'll insert an unwanted trailing blank into the target string. Therefore, I think the code using TRANSLATE (with multiple "from-to" pairs) will be shorter, e.g. (incomplete example)

c=translate(c,'aceee','àçéêè');
Alexxxxxxx
Pyrite | Level 9

So, can I use the following codes?

data want;
array f[9] $1 _temporary_ ('Ä'  'Æ'  'Ö'  'Ü'  'ß'  'ä'  'æ'  'ö'  'ü');
array t[9] $2 _temporary_ ('AE' 'AE' 'OE' 'UE' 'SS' 'ae' 'ae' 'oe' 'ue');
set have;
do _n_=1 to dim(f);
  c=tranwrd(c, f[_n_], t[_n_]);
  c=translate(c,'AAAAACEEEEIIIIDNOOOOOUUUYaaaaaceeeeiiiionooooouuuyy','ÀÁÂÃÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕØÙÚÛÝàáâãåçèéêëìíîïðñòóôõøùúûýÿ');
end;
run;
PGStats
Opal | Level 21

This would be a bit more efficient:

 

data want;
set have;
c = tranwrd(c, 'ß', 'SS');
c = prxChange("s/([ÄÆÖÜ])/\1E/o", -1, c);
c = prxChange("s/([äæöü])/\1e/o", -1, c);
*c = basechar(c);
c = translate(c,
    'AAAAAAACEEEEIIIIDNOOOOOOUUUUYaaaaaaaceeeeiiiionoooooouuuuyy',
    'ÀÁÂÃÅÄÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕØÖÙÚÛÜÝàáâãåäæçèéêëìíîïðñòóôõøöùúûüýÿ');
run;
PG
Alexxxxxxx
Pyrite | Level 9

Dear PG,

 

thanks for your advice.

 

However, I get the following file by running the codes,

data have;
length c $20;
input c;
cards;
Ägypten
Österreich
äußerst
Übung
müßig
Römer
Pierre
Étirer
Jean-Pierre
Ägypten
Österreich
äußerst
Übung
müßig
Römer
;
run;

data want;
set have;
c = tranwrd(c, 'ß', 'SS');
c = prxChange("s/([ÄÆÖÜ])/\1E/o", -1, c);
c = prxChange("s/([äæöü])/\1e/o", -1, c);
*c = basechar(c);
c = translate(c,'AAAAACEEEEIIIIDNOOOOOUUUYaaaaaceeeeiiiidnooooouuuyy','ÀÁÂÃÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕØÙÚÛÝàáâãåçèéêëìíîïðñòóôõøùúûýÿ');

run;
The SAS System
 
c
ÄEgypten
ÖEsterreich
äeuSSerst
ÜEbung
müeSSig
Röemer
Pierre
Etirer
Jean-Pierre
ÄEgypten
ÖEsterreich
äeuSSerst
ÜEbung
müeSSig
Röemer
FreelanceReinh
Jade | Level 19

@Alexxxxxxx wrote:

So, can I use the following codes?

data want;
array f[9] $1 _temporary_ ('Ä'  'Æ'  'Ö'  'Ü'  'ß'  'ä'  'æ'  'ö'  'ü');
array t[9] $2 _temporary_ ('AE' 'AE' 'OE' 'UE' 'SS' 'ae' 'ae' 'oe' 'ue');
set have;
do _n_=1 to dim(f);
  c=tranwrd(c, f[_n_], t[_n_]);
  c=translate(c,'AAAAACEEEEIIIIDNOOOOOUUUYaaaaaceeeeiiiionooooouuuyy','ÀÁÂÃÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕØÙÚÛÝàáâãåçèéêëìíîïðñòóôõøùúûýÿ');
end;
run;

  • The TRANSLATE call doesn't need to be repeated (nine times). It should occur outside of the DO loop.
  • The capital double S is not an ideal replacement for 'ß' except in words written in capitals (which rarely contain an 'ß').
  • Similarly, "Aegypten", "Oesterreich" etc. would be preferable to "AEgypten", "OEsterreich" etc. -- if the results are used for text output (like a report).
  • Most of the 1-character replacements could be accomplished with the elegant BASECHAR function, which ballardw and PGStats have suggested. The few exceptions might be questionable in your code anyway. For example, why should 'ð' (a kind of 'd' I think) be replaced by 'o'?
  • Of course, character variable C must be in dataset HAVE, with sufficient length.
PGStats
Opal | Level 21

So, to summarize comments above, a reasonable code could be simplified to :

 

data want;
set have;
c = tranwrd(c, 'ß', 'ss');
c = prxChange("s/([äæöüÄÆÖÜ])/\1e/o", -1, c);
c = basechar(c);
run;

Or

 

data want;
set have;
c = basechar(prxChange("s/([äæöüÄÆÖÜ])/\1e/o", -1, tranwrd(c, 'ß', 'ss')));
run;

Smiley Happy

PG
Tom
Super User Tom
Super User

This is probably the easiest solution to get working, although it requires that you create the list of mappings.

Note make sure to make the temporary variables long enough to hold the Unicode representations of the strings. Many could take up to 4 bytes.

data have;
  input c $40.;
cards;
Ägypten
Österreich
äußerst
Übung
müßig
Römer
Pierre
Étirer
Jean-Pierre
Ägypten
Österreich
äußerst
Übung
müßig
Römer
;

data want;
  array f[8] $4 _temporary_ ('ä'  'ö'  'ü'  'ß'  'Ä'  'Ö'  'Ü'  'É');
  array t[8] $4 _temporary_ ('ae' 'oe' 'ue' 'ss' 'Ae' 'Oe' 'Ue' 'E');
  set have;
  d=c;
  do _n_=1 to dim(f);
    d=tranwrd(d, trim(f[_n_]), trim(t[_n_]));
  end;
run;

proc print;
run;

image.png

You could also put your translation pairs into a dataset instead of placing it in the code.

data translate;
  array f[8] $4 _temporary_ ('ä'  'ö'  'ü'  'ß'  'Ä'  'Ö'  'Ü'  'É');
  array t[8] $4 _temporary_ ('ae' 'oe' 'ue' 'ss' 'Ae' 'Oe' 'Ue' 'E');
  do _n_=1 to dim(f);
     from=f[_n_];
     to=t[_n_];
     output;
  end;
run;

data want;
  set have;
  d=c;
  do p=1 to nobs;
    set translate point=p nobs=nobs;
    d=tranwrd(d, trim(from), trim(to));
  end;
  drop from to;
run;
PGStats
Opal | Level 21

If you have access to NLS (don't know if it requires a separate licence anymore) you can use function BASECHAR. Try :

 

data _null_;
input mot :$12.;
key = basechar(mot);
put key;
datalines;
Pierre
Étirer
Jean-Pierre
Ägypten
Österreich
äußerst
Übung
müßig
Römer
;
PG
Patrick
Opal | Level 21

@PGStats 

That's a nice function. I believe NLS comes as part of Foundation SAS. 

Looking at the result: It does the 1:1 translation but it doesn't really work for a German Umlaut. Correctly a ...

Ä

 ...should get converted into Ae

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 12213 views
  • 9 likes
  • 6 in conversation