BookmarkSubscribeRSS Feed
Yura2301
Quartz | Level 8

Hi all,

I would like to mask special, non utf-8 characters in all sas table, if it not possible - at least mask such spec. chars in concrette col.

Small example of such table:

data test;

      txt="Test1üt ÅåTest2 øTest3 æÆtest4";

run;

A tried a lot of SAS functions and non of them solved the issue, the best what I got -replaced string when all such spec chars become missing(space), that is not acceptable due I would like to change them by some similar values,

for example ü->u,å->a.

I can't predict all such spec chars that can come, so can't just replace by prxchange or tranwrd function one char to another, so need somne general solution.

I use SAS 9.1.3 under Windows.

Thanks!

15 REPLIES 15
Haikuo
Onyx | Level 15

I think you would have to hardcode it like this or using custom format:

data test;

  txt="Test1üt ÅåTest2 øTest3 æÆtest4";

new_txt=translate(txt,'AauoeE','ÅåüøæÆ');

run;

Regards,

Haikuo

Yura2301
Quartz | Level 8

Hi Hai,

Yep, but it still hardcoding, and if some another spec char. occurs tommorow  - I'll need to add changes to code again.

Whould like to find some general solution, not sure if it exists at all.

Maybe some session encoding options will help, or maybe some libname encoding option , I tried a lot of cases and nothing helped.

Thanks!

Haikuo
Onyx | Level 15

Then you need to lay out your converting rules first. With your rules being consistent, you can start looking at Byte function and Rank function, with the help of Substr function or PRX functions, a non hardcoding solution may be available.

Regards,

Haikuo

Yura2301
Quartz | Level 8

Hi again,

Yes, I also thought about such variant but from performance side it can be not the best, I would need to check each char in each column for set of rules ,for big table with long strings such operations can take a lot of time.

But as variant it can be, thanks!

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, just run that step under SAS 9.3 Win, and it seemed to store it fine. Personally I would strip these out, something like:  compress(x, , 'kn');

However you say you want to replace them with other characters.  Not sure how viable that is, what will you change the ø to for instance, some of these special characters would have different resolutions so you may have to do a set of translations.

Yura2301
Quartz | Level 8

Hi Rw9,

I'm working on SAS 9.1.3, so need to solve issue there for now.

Thanks!

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes,  compress(x, , 'kn'); this should still work for stripping out the characters, but I don't think there is an easy way of translating the characters.  You could create a replace dataset, something like:

TEXT_FROM          TEXT_TO

Å                              A

ø                              0

...

Then have a code generator:

data _null_;

     set replace end=last;

     if _n_=1 then call execute('data want; set have; var1=tranwrd("'||strip(text_from)||'","'||strip(text_to)||'";');

     else call execute('tranwrd("'||strip(text_from)||'","'||strip(text_to)||'");');

     if last then call execute('run;');

run;

You would only need to add any new records to that replace table (which could be in Excel...) rather than updating your code each time.  Note the code above will create a datastep with each tranwrd statement from the dataset replace.

Yura2301
Quartz | Level 8


Hi again,

Clear.

Pitty that compress(x, , 'kn') also replace all spaces, if it whould replace only non utf8 chars it would be better:), but offcource I can initially replace chars to some another char then replace it back in the end, but it to much probably.

Thanks!

Haikuo
Onyx | Level 15

compress(txt,,'kns') will keep blanks.

Yura2301
Quartz | Level 8

Hi again,

Thanks, will use it together with a few hardcoding for chars that I whould like to replace by similar.

Thanks!

BrunoMueller
SAS Super FREQ

With SAS9,4 there is a BASECHAR function. Although it does not help much it is good to know that with SAS9.4 this gets a lot easier

Yura2301
Quartz | Level 8

Hi Bruno,

I'm too far from SAS 9.4, actually on SAS 9.1.3,hopefully for a few month will be on SAS 9.3, but for now it SAS 9.1.3.

Thanks!

jakarman
Barite | Level 11

Check the NLS guide SAS 9.1 found at: SAS 9.1 Documentation

I do not get your problem well as it is looking you are expecting some wlatin1 chars that could get propagated to a ascii-7 DBMS or 437-DB2 ebcdic environment.
Knowing your request in this SBCS (Single byte Character Set) would help as those can not be more as 256 values. A trantab and/or encoding setting should help.

When you are getting data form utf8 source (XML html Word Excel etc) that must be stored in a latin1 environment as an impossible mission.
    

---->-- ja karman --<-----
Yura2301
Quartz | Level 8

Hi Jaap,

Thanks, will take a look.


sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 3037 views
  • 6 likes
  • 6 in conversation