DATA Step, Macro, Functions and more

Mask special,non utf-8 characters.

Reply
Regular Contributor
Posts: 161

Mask special,non utf-8 characters.

Hi all,

I would like to mask special, non utf-8 characters in all sas table, if it not possible - at least mask such spec. chars in concrette col.

Small example of such table:

data test;

      txt="Test1üt ÅåTest2 øTest3 æÆtest4";

run;

A tried a lot of SAS functions and non of them solved the issue, the best what I got -replaced string when all such spec chars become missing(space), that is not acceptable due I would like to change them by some similar values,

for example ü->u,å->a.

I can't predict all such spec chars that can come, so can't just replace by prxchange or tranwrd function one char to another, so need somne general solution.

I use SAS 9.1.3 under Windows.

Thanks!

Respected Advisor
Posts: 3,156

Re: Mask special,non utf-8 characters.

I think you would have to hardcode it like this or using custom format:

data test;

  txt="Test1üt ÅåTest2 øTest3 æÆtest4";

new_txt=translate(txt,'AauoeE','ÅåüøæÆ');

run;

Regards,

Haikuo

Regular Contributor
Posts: 161

Re: Mask special,non utf-8 characters.

Hi Hai,

Yep, but it still hardcoding, and if some another spec char. occurs tommorow  - I'll need to add changes to code again.

Whould like to find some general solution, not sure if it exists at all.

Maybe some session encoding options will help, or maybe some libname encoding option , I tried a lot of cases and nothing helped.

Thanks!

Respected Advisor
Posts: 3,156

Re: Mask special,non utf-8 characters.

Then you need to lay out your converting rules first. With your rules being consistent, you can start looking at Byte function and Rank function, with the help of Substr function or PRX functions, a non hardcoding solution may be available.

Regards,

Haikuo

Regular Contributor
Posts: 161

Re: Mask special,non utf-8 characters.

Hi again,

Yes, I also thought about such variant but from performance side it can be not the best, I would need to check each char in each column for set of rules ,for big table with long strings such operations can take a lot of time.

But as variant it can be, thanks!

Super User
Super User
Posts: 7,984

Re: Mask special,non utf-8 characters.

Well, just run that step under SAS 9.3 Win, and it seemed to store it fine. Personally I would strip these out, something like:  compress(x, , 'kn');

However you say you want to replace them with other characters.  Not sure how viable that is, what will you change the ø to for instance, some of these special characters would have different resolutions so you may have to do a set of translations.

Regular Contributor
Posts: 161

Re: Mask special,non utf-8 characters.

Hi Rw9,

I'm working on SAS 9.1.3, so need to solve issue there for now.

Thanks!

Super User
Super User
Posts: 7,984

Re: Mask special,non utf-8 characters.

Yes,  compress(x, , 'kn'); this should still work for stripping out the characters, but I don't think there is an easy way of translating the characters.  You could create a replace dataset, something like:

TEXT_FROM          TEXT_TO

Å                              A

ø                              0

...

Then have a code generator:

data _null_;

     set replace end=last;

     if _n_=1 then call execute('data want; set have; var1=tranwrd("'||strip(text_from)||'","'||strip(text_to)||'";');

     else call execute('tranwrd("'||strip(text_from)||'","'||strip(text_to)||'");');

     if last then call execute('run;');

run;

You would only need to add any new records to that replace table (which could be in Excel...) rather than updating your code each time.  Note the code above will create a datastep with each tranwrd statement from the dataset replace.

Regular Contributor
Posts: 161

Re: Mask special,non utf-8 characters.


Hi again,

Clear.

Pitty that compress(x, , 'kn') also replace all spaces, if it whould replace only non utf8 chars it would be betterSmiley Happy, but offcource I can initially replace chars to some another char then replace it back in the end, but it to much probably.

Thanks!

Respected Advisor
Posts: 3,156

Re: Mask special,non utf-8 characters.

compress(txt,,'kns') will keep blanks.

Regular Contributor
Posts: 161

Re: Mask special,non utf-8 characters.

Hi again,

Thanks, will use it together with a few hardcoding for chars that I whould like to replace by similar.

Thanks!

SAS Super FREQ
Posts: 709

Re: Mask special,non utf-8 characters.

With SAS9,4 there is a BASECHAR function. Although it does not help much it is good to know that with SAS9.4 this gets a lot easier

Regular Contributor
Posts: 161

Re: Mask special,non utf-8 characters.

Posted in reply to Bruno_SAS

Hi Bruno,

I'm too far from SAS 9.4, actually on SAS 9.1.3,hopefully for a few month will be on SAS 9.3, but for now it SAS 9.1.3.

Thanks!

Trusted Advisor
Posts: 3,215

Re: Mask special,non utf-8 characters.

Check the NLS guide SAS 9.1 found at: SAS 9.1 Documentation

I do not get your problem well as it is looking you are expecting some wlatin1 chars that could get propagated to a ascii-7 DBMS or 437-DB2 ebcdic environment.
Knowing your request in this SBCS (Single byte Character Set) would help as those can not be more as 256 values. A trantab and/or encoding setting should help.

When you are getting data form utf8 source (XML html Word Excel etc) that must be stored in a latin1 environment as an impossible mission.
    

---->-- ja karman --<-----
Regular Contributor
Posts: 161

Re: Mask special,non utf-8 characters.

Hi Jaap,

Thanks, will take a look.


Ask a Question
Discussion stats
  • 15 replies
  • 712 views
  • 6 likes
  • 6 in conversation