DATA Step, Macro, Functions and more

How to ignore accented text?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 12
Accepted Solution

How to ignore accented text?

Hi,

I am trying to merge two SAS files, one of which has names with accent marks, the other has same names but unaccented.  How can I get SAS to ignore the accents while merging?  (I don't care if they're lost.)  I've tried wading through the National Language Support documentation, but am not finding anything.

Thanks!


Accepted Solutions
Solution
‎01-14-2014 01:47 PM
Respected Advisor
Posts: 4,663

Re: How to ignore accented text?

As far as I know, there is no general function for dealing with accented characters. You can either create an intermediary dataset with a new variable and use a datastep merge, or use SQL and join on an expression.

You don't say what language your accented characters come from. A solution corresponding to the French language would be to match on a variable/expression like:

matchString = translate(lowcase(accentedString),"aaceeeeiiouu","àâçéèêëîïôùû");

PG

PG

View solution in original post


All Replies
Solution
‎01-14-2014 01:47 PM
Respected Advisor
Posts: 4,663

Re: How to ignore accented text?

As far as I know, there is no general function for dealing with accented characters. You can either create an intermediary dataset with a new variable and use a datastep merge, or use SQL and join on an expression.

You don't say what language your accented characters come from. A solution corresponding to the French language would be to match on a variable/expression like:

matchString = translate(lowcase(accentedString),"aaceeeeiiouu","àâçéèêëîïôùû");

PG

PG
Occasional Contributor
Posts: 12

Re: How to ignore accented text?

The dataset is in English, but because there are first and last names, the accents can come from various languages.  But I think they're mostly generic French/Spanish accents.  Thanks for the suggestion; I'll give it a try.

Occasional Contributor
Posts: 12

Re: How to ignore accented text?

That worked great!  (Once I found an extended character map)  Thanks so much!

SAS Employee
Posts: 17

Re: How to ignore accented text?

/*

   Demonstrate that the DATA STEP is sensitive to linguistic

   collating sequences and this can be used to perform a merge

   that is insensitive to case or accents.

   Here, we're merging/joining two data sets, one containing

   monthly revenue with another containing a monthly count of

   customers, to calculate revenue per customer.

*/

data clients;

  length mois $ 10;

  infile datalines delimiter=',';

  input mois compte;

  datalines;

  janvier, 370

  février, 400

  mars, 430

  avril, 415

  mai, 410

  juin, 450

  juillet, 449

  août, 403

  septembre, 339

  novembre, 375

  décembre, 370

;

run;

data revenu;

  length mois $ 10;

  infile datalines delimiter=',';

  input mois ventes;

  datalines;

  JANVIER, 376784

  FEVRIER, 396911

  MARS, 441327

  AVRIL, 419272

  MAI, 408291

  JUIN, 443791

  JUILLET, 442111

  AOUT, 402771

  SEPTEMBRE, 337727

  NOVEMBRE, 381929

  DECEMBRE, 376771

;

run;

proc sort data=clients sortseq=linguistic(strength=1);

  by mois;

run;

proc sort data=revenu sortseq=linguistic(strength=1);

  by mois;

run;

data resultat;

  merge clients revenu;

  by mois;

  revenuparclient = ventes/compte;

run;

proc print;

run;

Occasional Contributor
Posts: 12

Re: How to ignore accented text?

Thank you!  This alternate approach will also be useful to me in the future.

SAS Employee
Posts: 17

Re: How to ignore accented text?

FYI, a similar approach can be taken with PROC SQL and a join using the SORTKEY function.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 1805 views
  • 6 likes
  • 3 in conversation