- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
is there any code to replace the accented characters with non-accented characters in the variable? for example, u umlaut becomes 'ue', or 'é' becomes 'e'.
thanks in advance.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This would be a bit more efficient:
data want;
set have;
c = tranwrd(c, 'ß', 'SS');
c = prxChange("s/([ÄÆÖÜ])/\1E/o", -1, c);
c = prxChange("s/([äæöü])/\1e/o", -1, c);
*c = basechar(c);
c = translate(c,
'AAAAAAACEEEEIIIIDNOOOOOOUUUUYaaaaaaaceeeeiiiionoooooouuuuyy',
'ÀÁÂÃÅÄÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕØÖÙÚÛÜÝàáâãåäæçèéêëìíîïðñòóôõøöùúûüýÿ');
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If specific letters are the concern TRANSLATE will work:
data example; x='abcdé'; y=translate(x,'e','é'); run;
If you don't know of all the likely culprits then BASECHAR may work but I'm not sure of an umlaut to 2-character as you desire.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Alexxxxxxx,
For the 2-character replacements you can use TRANWRD. Unlike TRANSLATE it doesn't allow for multiple "from-to" pairs in the same function call, so you may want to use a loop.
Example:
data have;
length c $20;
input c;
cards;
Ägypten
Österreich
äußerst
Übung
müßig
Römer
;
data want;
array f[7] $1 _temporary_ ('ä' 'ö' 'ü' 'ß' 'Ä' 'Ö' 'Ü' );
array t[7] $2 _temporary_ ('ae' 'oe' 'ue' 'ss' 'Ae' 'Oe' 'Ue');
set have;
do _n_=1 to dim(f);
c=tranwrd(c, f[_n_], t[_n_]);
end;
run;
Make sure that the length of the target variable is sufficient to accommodate the strings after the replacement(s).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear FreelanceReinhard,
thanks for your helpful advice.
Can I use this code to run 1-character replacement as well? for example 'à' becomes 'a'?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Alexxxxxxx wrote:
Dear FreelanceReinhard,
thanks for your helpful advice.
Can I use this code to run 1-character replacement as well? for example 'à' becomes 'a'?
Of course, the target values in TRANWRD can be single characters as well, but if you add them to the existing $2 array, you'll insert an unwanted trailing blank into the target string. Therefore, I think the code using TRANSLATE (with multiple "from-to" pairs) will be shorter, e.g. (incomplete example)
c=translate(c,'aceee','àçéêè');
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So, can I use the following codes?
data want;
array f[9] $1 _temporary_ ('Ä' 'Æ' 'Ö' 'Ü' 'ß' 'ä' 'æ' 'ö' 'ü');
array t[9] $2 _temporary_ ('AE' 'AE' 'OE' 'UE' 'SS' 'ae' 'ae' 'oe' 'ue');
set have;
do _n_=1 to dim(f);
c=tranwrd(c, f[_n_], t[_n_]);
c=translate(c,'AAAAACEEEEIIIIDNOOOOOUUUYaaaaaceeeeiiiionooooouuuyy','ÀÁÂÃÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕØÙÚÛÝàáâãåçèéêëìíîïðñòóôõøùúûýÿ');
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This would be a bit more efficient:
data want;
set have;
c = tranwrd(c, 'ß', 'SS');
c = prxChange("s/([ÄÆÖÜ])/\1E/o", -1, c);
c = prxChange("s/([äæöü])/\1e/o", -1, c);
*c = basechar(c);
c = translate(c,
'AAAAAAACEEEEIIIIDNOOOOOOUUUUYaaaaaaaceeeeiiiionoooooouuuuyy',
'ÀÁÂÃÅÄÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕØÖÙÚÛÜÝàáâãåäæçèéêëìíîïðñòóôõøöùúûüýÿ');
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear PG,
thanks for your advice.
However, I get the following file by running the codes,
data have;
length c $20;
input c;
cards;
Ägypten
Österreich
äußerst
Übung
müßig
Römer
Pierre
Étirer
Jean-Pierre
Ägypten
Österreich
äußerst
Übung
müßig
Römer
;
run;
data want;
set have;
c = tranwrd(c, 'ß', 'SS');
c = prxChange("s/([ÄÆÖÜ])/\1E/o", -1, c);
c = prxChange("s/([äæöü])/\1e/o", -1, c);
*c = basechar(c);
c = translate(c,'AAAAACEEEEIIIIDNOOOOOUUUYaaaaaceeeeiiiidnooooouuuyy','ÀÁÂÃÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕØÙÚÛÝàáâãåçèéêëìíîïðñòóôõøùúûýÿ');
run;
The SAS System |
c |
ÄEgypten |
ÖEsterreich |
äeuSSerst |
ÜEbung |
müeSSig |
Röemer |
Pierre |
Etirer |
Jean-Pierre |
ÄEgypten |
ÖEsterreich |
äeuSSerst |
ÜEbung |
müeSSig |
Röemer |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Alexxxxxxx wrote:
So, can I use the following codes?
data want; array f[9] $1 _temporary_ ('Ä' 'Æ' 'Ö' 'Ü' 'ß' 'ä' 'æ' 'ö' 'ü'); array t[9] $2 _temporary_ ('AE' 'AE' 'OE' 'UE' 'SS' 'ae' 'ae' 'oe' 'ue'); set have; do _n_=1 to dim(f); c=tranwrd(c, f[_n_], t[_n_]); c=translate(c,'AAAAACEEEEIIIIDNOOOOOUUUYaaaaaceeeeiiiionooooouuuyy','ÀÁÂÃÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕØÙÚÛÝàáâãåçèéêëìíîïðñòóôõøùúûýÿ'); end; run;
- The TRANSLATE call doesn't need to be repeated (nine times). It should occur outside of the DO loop.
- The capital double S is not an ideal replacement for 'ß' except in words written in capitals (which rarely contain an 'ß').
- Similarly, "Aegypten", "Oesterreich" etc. would be preferable to "AEgypten", "OEsterreich" etc. -- if the results are used for text output (like a report).
- Most of the 1-character replacements could be accomplished with the elegant BASECHAR function, which ballardw and PGStats have suggested. The few exceptions might be questionable in your code anyway. For example, why should '
ð
' (a kind of 'd' I think) be replaced by 'o'? - Of course, character variable C must be in dataset HAVE, with sufficient length.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So, to summarize comments above, a reasonable code could be simplified to :
data want;
set have;
c = tranwrd(c, 'ß', 'ss');
c = prxChange("s/([äæöüÄÆÖÜ])/\1e/o", -1, c);
c = basechar(c);
run;
Or
data want;
set have;
c = basechar(prxChange("s/([äæöüÄÆÖÜ])/\1e/o", -1, tranwrd(c, 'ß', 'ss')));
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is probably the easiest solution to get working, although it requires that you create the list of mappings.
Note make sure to make the temporary variables long enough to hold the Unicode representations of the strings. Many could take up to 4 bytes.
data have;
input c $40.;
cards;
Ägypten
Österreich
äußerst
Übung
müßig
Römer
Pierre
Étirer
Jean-Pierre
Ägypten
Österreich
äußerst
Übung
müßig
Römer
;
data want;
array f[8] $4 _temporary_ ('ä' 'ö' 'ü' 'ß' 'Ä' 'Ö' 'Ü' 'É');
array t[8] $4 _temporary_ ('ae' 'oe' 'ue' 'ss' 'Ae' 'Oe' 'Ue' 'E');
set have;
d=c;
do _n_=1 to dim(f);
d=tranwrd(d, trim(f[_n_]), trim(t[_n_]));
end;
run;
proc print;
run;
You could also put your translation pairs into a dataset instead of placing it in the code.
data translate;
array f[8] $4 _temporary_ ('ä' 'ö' 'ü' 'ß' 'Ä' 'Ö' 'Ü' 'É');
array t[8] $4 _temporary_ ('ae' 'oe' 'ue' 'ss' 'Ae' 'Oe' 'Ue' 'E');
do _n_=1 to dim(f);
from=f[_n_];
to=t[_n_];
output;
end;
run;
data want;
set have;
d=c;
do p=1 to nobs;
set translate point=p nobs=nobs;
d=tranwrd(d, trim(from), trim(to));
end;
drop from to;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you have access to NLS (don't know if it requires a separate licence anymore) you can use function BASECHAR. Try :
data _null_;
input mot :$12.;
key = basechar(mot);
put key;
datalines;
Pierre
Étirer
Jean-Pierre
Ägypten
Österreich
äußerst
Übung
müßig
Römer
;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That's a nice function. I believe NLS comes as part of Foundation SAS.
Looking at the result: It does the 1:1 translation but it doesn't really work for a German Umlaut. Correctly a ...
Ä
...should get converted into Ae