- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to merge two SAS files, one of which has names with accent marks, the other has same names but unaccented. How can I get SAS to ignore the accents while merging? (I don't care if they're lost.) I've tried wading through the National Language Support documentation, but am not finding anything.
Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As far as I know, there is no general function for dealing with accented characters. You can either create an intermediary dataset with a new variable and use a datastep merge, or use SQL and join on an expression.
You don't say what language your accented characters come from. A solution corresponding to the French language would be to match on a variable/expression like:
matchString = translate(lowcase(accentedString),"aaceeeeiiouu","àâçéèêëîïôùû");
PG
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As far as I know, there is no general function for dealing with accented characters. You can either create an intermediary dataset with a new variable and use a datastep merge, or use SQL and join on an expression.
You don't say what language your accented characters come from. A solution corresponding to the French language would be to match on a variable/expression like:
matchString = translate(lowcase(accentedString),"aaceeeeiiouu","àâçéèêëîïôùû");
PG
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The dataset is in English, but because there are first and last names, the accents can come from various languages. But I think they're mostly generic French/Spanish accents. Thanks for the suggestion; I'll give it a try.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That worked great! (Once I found an extended character map) Thanks so much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
/*
Demonstrate that the DATA STEP is sensitive to linguistic
collating sequences and this can be used to perform a merge
that is insensitive to case or accents.
Here, we're merging/joining two data sets, one containing
monthly revenue with another containing a monthly count of
customers, to calculate revenue per customer.
*/
data clients;
length mois $ 10;
infile datalines delimiter=',';
input mois compte;
datalines;
janvier, 370
février, 400
mars, 430
avril, 415
mai, 410
juin, 450
juillet, 449
août, 403
septembre, 339
novembre, 375
décembre, 370
;
run;
data revenu;
length mois $ 10;
infile datalines delimiter=',';
input mois ventes;
datalines;
JANVIER, 376784
FEVRIER, 396911
MARS, 441327
AVRIL, 419272
MAI, 408291
JUIN, 443791
JUILLET, 442111
AOUT, 402771
SEPTEMBRE, 337727
NOVEMBRE, 381929
DECEMBRE, 376771
;
run;
proc sort data=clients sortseq=linguistic(strength=1);
by mois;
run;
proc sort data=revenu sortseq=linguistic(strength=1);
by mois;
run;
data resultat;
merge clients revenu;
by mois;
revenuparclient = ventes/compte;
run;
proc print;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you! This alternate approach will also be useful to me in the future.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
FYI, a similar approach can be taken with PROC SQL and a join using the SORTKEY function.