BookmarkSubscribeRSS Feed
R_Win
Calcite | Level 5

Hi  i am having sepecial characters like this ╥Bç_┐▌lær“Y╓╤╙╥I  how can i compress them ,i am may get these kind of special characters in the future i can use the compress function and do it is there any way to compress all this kind of special characters like this for the unknown special characters like this. New=compress(old,'╥Bç_┐▌lær“Y╓╤╙╥I');

8 REPLIES 8
Ksharp
Super User

compress has the third argument 'k' which can keep all the character you want.

For example: I only want keep character B

data want;
 set sashelp.class;
 want=compress(name,'B','k');
 keep want;
run;


Ksharp

FriedEgg
SAS Employee

compress(name,,'kad')

k - keep

a - letters

d - digits

this is equivalent to Keith's regular expression (s/[\W_]//o).

\W finds the inverse of \w (letters, digits, and underscores) and then he choose to then also exclude underscores.

If you do not want to remove underscores use 'kn' instead of a and d or change the regular expression to s/\W//o

art297
Opal | Level 21

Although, if you are going to use the k(eep) operator, you may also want to include s (for spaces and tabs) and p for punctuation marks (like commas and periods).

Keith
Obsidian | Level 7

Hi @FriedEegg,

I did think of that Compress solution first of all (as it appears to be the simplest), however I discovered it doesn't strip out the foreign letters (e.g. ç or æ), whereas Prxchange does.  I guess there are subtle differences in the underlying code for these functions.

So it looks like it depends on whether these letters are wanted or not as to which solution is the best to use.

Regards,

Keith

art297
Opal | Level 21

Compress can be limited to just English characters.  Take a look at: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm

Keith
Obsidian | Level 7

@art297

Thanks for that, sadly I'm still using 9.1.

If I put compress(old,,"kad") it keeps the foreign letters, yet if I put compress(old,,"kfd") it removes them!  The only difference between the 2 should be that "f" includes the underscore.  Looks like a bug in 9.1

art297
Opal | Level 21

Is it at least 9.1.3?  The modifiers weren't even added until then.  However, if you are on 9.1.3, then you could just specify exactly which characters you want to keep in the second field (which is currently only ,, ).

Regardless, is sounds like you already have an acceptable solution.

Keith
Obsidian | Level 7

Another option is to use a regular expression.  The PERL function PRXCHANGE will do the job here, the code below will remove all non-word characters and the underscore.  The syntax takes a bit of getting used to, however there are plenty of online documents to help, including this useful tip sheet. http://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf

data test;

length old new $20;

old="-Bç_+¦lær“Y+-+-I ";

new=prxchange('s/[\W_]//o', -1, old);

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 18142 views
  • 0 likes
  • 5 in conversation