DATA Step, Macro, Functions and more

Reg:Compress Special Characters

Reply
Regular Contributor
Posts: 229

Reg:Compress Special Characters

Hi  i am having sepecial characters like this ╥Bç_┐▌lær“Y╓╤╙╥I  how can i compress them ,i am may get these kind of special characters in the future i can use the compress function and do it is there any way to compress all this kind of special characters like this for the unknown special characters like this. New=compress(old,'╥Bç_┐▌lær“Y╓╤╙╥I');

Super User
Posts: 9,681

Reg:Compress Special Characters

compress has the third argument 'k' which can keep all the character you want.

For example: I only want keep character B

data want;
 set sashelp.class;
 want=compress(name,'B','k');
 keep want;
run;


Ksharp

Trusted Advisor
Posts: 1,300

Reg:Compress Special Characters

compress(name,,'kad')

k - keep

a - letters

d - digits

this is equivalent to Keith's regular expression (s/[\W_]//o).

\W finds the inverse of \w (letters, digits, and underscores) and then he choose to then also exclude underscores.

If you do not want to remove underscores use 'kn' instead of a and d or change the regular expression to s/\W//o

PROC Star
Posts: 7,363

Reg:Compress Special Characters

Although, if you are going to use the k(eep) operator, you may also want to include s (for spaces and tabs) and p for punctuation marks (like commas and periods).

Regular Contributor
Posts: 151

Reg:Compress Special Characters

Hi @FriedEegg,

I did think of that Compress solution first of all (as it appears to be the simplest), however I discovered it doesn't strip out the foreign letters (e.g. ç or æ), whereas Prxchange does.  I guess there are subtle differences in the underlying code for these functions.

So it looks like it depends on whether these letters are wanted or not as to which solution is the best to use.

Regards,

Keith

PROC Star
Posts: 7,363

Reg:Compress Special Characters

Compress can be limited to just English characters.  Take a look at: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm

Regular Contributor
Posts: 151

Reg:Compress Special Characters

@art297

Thanks for that, sadly I'm still using 9.1.

If I put compress(old,,"kad") it keeps the foreign letters, yet if I put compress(old,,"kfd") it removes them!  The only difference between the 2 should be that "f" includes the underscore.  Looks like a bug in 9.1

PROC Star
Posts: 7,363

Re: Reg:Compress Special Characters

Is it at least 9.1.3?  The modifiers weren't even added until then.  However, if you are on 9.1.3, then you could just specify exactly which characters you want to keep in the second field (which is currently only ,, ).

Regardless, is sounds like you already have an acceptable solution.

Regular Contributor
Posts: 151

Reg:Compress Special Characters

Another option is to use a regular expression.  The PERL function PRXCHANGE will do the job here, the code below will remove all non-word characters and the underscore.  The syntax takes a bit of getting used to, however there are plenty of online documents to help, including this useful tip sheet. http://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf

data test;

length old new $20;

old="-Bç_+¦lær“Y+-+-I ";

new=prxchange('s/[\W_]//o', -1, old);

run;

Ask a Question
Discussion stats
  • 8 replies
  • 9907 views
  • 0 likes
  • 5 in conversation