Dear all,
is there any code to replace the accented characters with non-accented characters in the variable? for example, u umlaut becomes 'ue', or 'é' becomes 'e'.
thanks in advance.
This would be a bit more efficient:
data want;
set have;
c = tranwrd(c, 'ß', 'SS');
c = prxChange("s/([ÄÆÖÜ])/\1E/o", -1, c);
c = prxChange("s/([äæöü])/\1e/o", -1, c);
*c = basechar(c);
c = translate(c,
'AAAAAAACEEEEIIIIDNOOOOOOUUUUYaaaaaaaceeeeiiiionoooooouuuuyy',
'ÀÁÂÃÅÄÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕØÖÙÚÛÜÝàáâãåäæçèéêëìíîïðñòóôõøöùúûüýÿ');
run;
If specific letters are the concern TRANSLATE will work:
data example; x='abcdé'; y=translate(x,'e','é'); run;
If you don't know of all the likely culprits then BASECHAR may work but I'm not sure of an umlaut to 2-character as you desire.
Hi @Alexxxxxxx,
For the 2-character replacements you can use TRANWRD. Unlike TRANSLATE it doesn't allow for multiple "from-to" pairs in the same function call, so you may want to use a loop.
Example:
data have;
length c $20;
input c;
cards;
Ägypten
Österreich
äußerst
Übung
müßig
Römer
;
data want;
array f[7] $1 _temporary_ ('ä' 'ö' 'ü' 'ß' 'Ä' 'Ö' 'Ü' );
array t[7] $2 _temporary_ ('ae' 'oe' 'ue' 'ss' 'Ae' 'Oe' 'Ue');
set have;
do _n_=1 to dim(f);
c=tranwrd(c, f[_n_], t[_n_]);
end;
run;
Make sure that the length of the target variable is sufficient to accommodate the strings after the replacement(s).
Dear FreelanceReinhard,
thanks for your helpful advice.
Can I use this code to run 1-character replacement as well? for example 'à' becomes 'a'?
@Alexxxxxxx wrote:
Dear FreelanceReinhard,
thanks for your helpful advice.
Can I use this code to run 1-character replacement as well? for example 'à' becomes 'a'?
Of course, the target values in TRANWRD can be single characters as well, but if you add them to the existing $2 array, you'll insert an unwanted trailing blank into the target string. Therefore, I think the code using TRANSLATE (with multiple "from-to" pairs) will be shorter, e.g. (incomplete example)
c=translate(c,'aceee','àçéêè');
So, can I use the following codes?
data want;
array f[9] $1 _temporary_ ('Ä' 'Æ' 'Ö' 'Ü' 'ß' 'ä' 'æ' 'ö' 'ü');
array t[9] $2 _temporary_ ('AE' 'AE' 'OE' 'UE' 'SS' 'ae' 'ae' 'oe' 'ue');
set have;
do _n_=1 to dim(f);
c=tranwrd(c, f[_n_], t[_n_]);
c=translate(c,'AAAAACEEEEIIIIDNOOOOOUUUYaaaaaceeeeiiiionooooouuuyy','ÀÁÂÃÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕØÙÚÛÝàáâãåçèéêëìíîïðñòóôõøùúûýÿ');
end;
run;
This would be a bit more efficient:
data want;
set have;
c = tranwrd(c, 'ß', 'SS');
c = prxChange("s/([ÄÆÖÜ])/\1E/o", -1, c);
c = prxChange("s/([äæöü])/\1e/o", -1, c);
*c = basechar(c);
c = translate(c,
'AAAAAAACEEEEIIIIDNOOOOOOUUUUYaaaaaaaceeeeiiiionoooooouuuuyy',
'ÀÁÂÃÅÄÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕØÖÙÚÛÜÝàáâãåäæçèéêëìíîïðñòóôõøöùúûüýÿ');
run;
Dear PG,
thanks for your advice.
However, I get the following file by running the codes,
data have;
length c $20;
input c;
cards;
Ägypten
Österreich
äußerst
Übung
müßig
Römer
Pierre
Étirer
Jean-Pierre
Ägypten
Österreich
äußerst
Übung
müßig
Römer
;
run;
data want;
set have;
c = tranwrd(c, 'ß', 'SS');
c = prxChange("s/([ÄÆÖÜ])/\1E/o", -1, c);
c = prxChange("s/([äæöü])/\1e/o", -1, c);
*c = basechar(c);
c = translate(c,'AAAAACEEEEIIIIDNOOOOOUUUYaaaaaceeeeiiiidnooooouuuyy','ÀÁÂÃÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕØÙÚÛÝàáâãåçèéêëìíîïðñòóôõøùúûýÿ');
run;
The SAS System |
c |
ÄEgypten |
ÖEsterreich |
äeuSSerst |
ÜEbung |
müeSSig |
Röemer |
Pierre |
Etirer |
Jean-Pierre |
ÄEgypten |
ÖEsterreich |
äeuSSerst |
ÜEbung |
müeSSig |
Röemer |
@Alexxxxxxx wrote:
So, can I use the following codes?
data want; array f[9] $1 _temporary_ ('Ä' 'Æ' 'Ö' 'Ü' 'ß' 'ä' 'æ' 'ö' 'ü'); array t[9] $2 _temporary_ ('AE' 'AE' 'OE' 'UE' 'SS' 'ae' 'ae' 'oe' 'ue'); set have; do _n_=1 to dim(f); c=tranwrd(c, f[_n_], t[_n_]); c=translate(c,'AAAAACEEEEIIIIDNOOOOOUUUYaaaaaceeeeiiiionooooouuuyy','ÀÁÂÃÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕØÙÚÛÝàáâãåçèéêëìíîïðñòóôõøùúûýÿ'); end; run;
ð
' (a kind of 'd' I think) be replaced by 'o'?So, to summarize comments above, a reasonable code could be simplified to :
data want;
set have;
c = tranwrd(c, 'ß', 'ss');
c = prxChange("s/([äæöüÄÆÖÜ])/\1e/o", -1, c);
c = basechar(c);
run;
Or
data want;
set have;
c = basechar(prxChange("s/([äæöüÄÆÖÜ])/\1e/o", -1, tranwrd(c, 'ß', 'ss')));
run;
This is probably the easiest solution to get working, although it requires that you create the list of mappings.
Note make sure to make the temporary variables long enough to hold the Unicode representations of the strings. Many could take up to 4 bytes.
data have;
input c $40.;
cards;
Ägypten
Österreich
äußerst
Übung
müßig
Römer
Pierre
Étirer
Jean-Pierre
Ägypten
Österreich
äußerst
Übung
müßig
Römer
;
data want;
array f[8] $4 _temporary_ ('ä' 'ö' 'ü' 'ß' 'Ä' 'Ö' 'Ü' 'É');
array t[8] $4 _temporary_ ('ae' 'oe' 'ue' 'ss' 'Ae' 'Oe' 'Ue' 'E');
set have;
d=c;
do _n_=1 to dim(f);
d=tranwrd(d, trim(f[_n_]), trim(t[_n_]));
end;
run;
proc print;
run;
You could also put your translation pairs into a dataset instead of placing it in the code.
data translate;
array f[8] $4 _temporary_ ('ä' 'ö' 'ü' 'ß' 'Ä' 'Ö' 'Ü' 'É');
array t[8] $4 _temporary_ ('ae' 'oe' 'ue' 'ss' 'Ae' 'Oe' 'Ue' 'E');
do _n_=1 to dim(f);
from=f[_n_];
to=t[_n_];
output;
end;
run;
data want;
set have;
d=c;
do p=1 to nobs;
set translate point=p nobs=nobs;
d=tranwrd(d, trim(from), trim(to));
end;
drop from to;
run;
If you have access to NLS (don't know if it requires a separate licence anymore) you can use function BASECHAR. Try :
data _null_;
input mot :$12.;
key = basechar(mot);
put key;
datalines;
Pierre
Étirer
Jean-Pierre
Ägypten
Österreich
äußerst
Übung
müßig
Römer
;
That's a nice function. I believe NLS comes as part of Foundation SAS.
Looking at the result: It does the 1:1 translation but it doesn't really work for a German Umlaut. Correctly a ...
Ä
...should get converted into Ae
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.