Hi,
I am trying to merge two SAS files, one of which has names with accent marks, the other has same names but unaccented. How can I get SAS to ignore the accents while merging? (I don't care if they're lost.) I've tried wading through the National Language Support documentation, but am not finding anything.
Thanks!
As far as I know, there is no general function for dealing with accented characters. You can either create an intermediary dataset with a new variable and use a datastep merge, or use SQL and join on an expression.
You don't say what language your accented characters come from. A solution corresponding to the French language would be to match on a variable/expression like:
matchString = translate(lowcase(accentedString),"aaceeeeiiouu","àâçéèêëîïôùû");
PG
As far as I know, there is no general function for dealing with accented characters. You can either create an intermediary dataset with a new variable and use a datastep merge, or use SQL and join on an expression.
You don't say what language your accented characters come from. A solution corresponding to the French language would be to match on a variable/expression like:
matchString = translate(lowcase(accentedString),"aaceeeeiiouu","àâçéèêëîïôùû");
PG
The dataset is in English, but because there are first and last names, the accents can come from various languages. But I think they're mostly generic French/Spanish accents. Thanks for the suggestion; I'll give it a try.
That worked great! (Once I found an extended character map) Thanks so much!
/*
Demonstrate that the DATA STEP is sensitive to linguistic
collating sequences and this can be used to perform a merge
that is insensitive to case or accents.
Here, we're merging/joining two data sets, one containing
monthly revenue with another containing a monthly count of
customers, to calculate revenue per customer.
*/
data clients;
length mois $ 10;
infile datalines delimiter=',';
input mois compte;
datalines;
janvier, 370
février, 400
mars, 430
avril, 415
mai, 410
juin, 450
juillet, 449
août, 403
septembre, 339
novembre, 375
décembre, 370
;
run;
data revenu;
length mois $ 10;
infile datalines delimiter=',';
input mois ventes;
datalines;
JANVIER, 376784
FEVRIER, 396911
MARS, 441327
AVRIL, 419272
MAI, 408291
JUIN, 443791
JUILLET, 442111
AOUT, 402771
SEPTEMBRE, 337727
NOVEMBRE, 381929
DECEMBRE, 376771
;
run;
proc sort data=clients sortseq=linguistic(strength=1);
by mois;
run;
proc sort data=revenu sortseq=linguistic(strength=1);
by mois;
run;
data resultat;
merge clients revenu;
by mois;
revenuparclient = ventes/compte;
run;
proc print;
run;
Thank you! This alternate approach will also be useful to me in the future.
FYI, a similar approach can be taken with PROC SQL and a join using the SORTKEY function.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.