Solved: Re: How to ignore accented text?

DebbiBJ · Posted 01-14-2014 01:21 PM

Hi,

I am trying to merge two SAS files, one of which has names with accent marks, the other has same names but unaccented. How can I get SAS to ignore the accents while merging? (I don't care if they're lost.) I've tried wading through the National Language Support documentation, but am not finding anything.

Thanks!

PGStats · Posted 01-14-2014 01:47 PM

As far as I know, there is no general function for dealing with accented characters. You can either create an intermediary dataset with a new variable and use a datastep merge, or use SQL and join on an expression.

You don't say what language your accented characters come from. A solution corresponding to the French language would be to match on a variable/expression like:

matchString = translate(lowcase(accentedString),"aaceeeeiiouu","àâçéèêëîïôùû");

PG

View solution in original post

PGStats · Posted 01-14-2014 01:47 PM

As far as I know, there is no general function for dealing with accented characters. You can either create an intermediary dataset with a new variable and use a datastep merge, or use SQL and join on an expression.

You don't say what language your accented characters come from. A solution corresponding to the French language would be to match on a variable/expression like:

matchString = translate(lowcase(accentedString),"aaceeeeiiouu","àâçéèêëîïôùû");

PG

DebbiBJ · Posted 01-14-2014 01:51 PM

The dataset is in English, but because there are first and last names, the accents can come from various languages. But I think they're mostly generic French/Spanish accents. Thanks for the suggestion; I'll give it a try.

DebbiBJ · Posted 01-15-2014 11:27 AM

That worked great! (Once I found an extended character map) Thanks so much!

scmebu · Posted 01-15-2014 10:48 AM

/*

Demonstrate that the DATA STEP is sensitive to linguistic

collating sequences and this can be used to perform a merge

that is insensitive to case or accents.

Here, we're merging/joining two data sets, one containing

monthly revenue with another containing a monthly count of

customers, to calculate revenue per customer.

*/

data clients;

length mois $ 10;

infile datalines delimiter=',';

input mois compte;

datalines;

janvier, 370

février, 400

mars, 430

avril, 415

mai, 410

juin, 450

juillet, 449

août, 403

septembre, 339

novembre, 375

décembre, 370

;

run;

data revenu;

length mois $ 10;

infile datalines delimiter=',';

input mois ventes;

datalines;

JANVIER, 376784

FEVRIER, 396911

MARS, 441327

AVRIL, 419272

MAI, 408291

JUIN, 443791

JUILLET, 442111

AOUT, 402771

SEPTEMBRE, 337727

NOVEMBRE, 381929

DECEMBRE, 376771

;

run;

proc sort data=clients sortseq=linguistic(strength=1);

by mois;

run;

proc sort data=revenu sortseq=linguistic(strength=1);

by mois;

run;

data resultat;

merge clients revenu;

by mois;

revenuparclient = ventes/compte;

run;

proc print;

run;

DebbiBJ · Posted 01-15-2014 11:26 AM

Thank you! This alternate approach will also be useful to me in the future.

scmebu · Posted 01-15-2014 12:34 PM

FYI, a similar approach can be taken with PROC SQL and a join using the SORTKEY function.

Registration is open

SAS Training: Just a Click Away