02-24-2012 02:25 AM
Hi i am having sepecial characters like this ╥Bç_┐▌lær“Y╓╤╙╥I how can i compress them ,i am may get these kind of special characters in the future i can use the compress function and do it is there any way to compress all this kind of special characters like this for the unknown special characters like this. New=compress(old,'╥Bç_┐▌lær“Y╓╤╙╥I');
02-24-2012 02:45 AM
compress has the third argument 'k' which can keep all the character you want.
For example: I only want keep character B
data want; set sashelp.class; want=compress(name,'B','k'); keep want; run;
02-24-2012 06:25 PM
k - keep
a - letters
d - digits
this is equivalent to Keith's regular expression (s/[\W_]//o).
\W finds the inverse of \w (letters, digits, and underscores) and then he choose to then also exclude underscores.
If you do not want to remove underscores use 'kn' instead of a and d or change the regular expression to s/\W//o
02-24-2012 06:36 PM
Although, if you are going to use the k(eep) operator, you may also want to include s (for spaces and tabs) and p for punctuation marks (like commas and periods).
02-27-2012 03:46 AM
I did think of that Compress solution first of all (as it appears to be the simplest), however I discovered it doesn't strip out the foreign letters (e.g. ç or æ), whereas Prxchange does. I guess there are subtle differences in the underlying code for these functions.
So it looks like it depends on whether these letters are wanted or not as to which solution is the best to use.
02-27-2012 08:37 AM
Compress can be limited to just English characters. Take a look at: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm
02-27-2012 09:28 AM
Thanks for that, sadly I'm still using 9.1.
If I put compress(old,,"kad") it keeps the foreign letters, yet if I put compress(old,,"kfd") it removes them! The only difference between the 2 should be that "f" includes the underscore. Looks like a bug in 9.1
02-27-2012 09:38 AM
Is it at least 9.1.3? The modifiers weren't even added until then. However, if you are on 9.1.3, then you could just specify exactly which characters you want to keep in the second field (which is currently only ,, ).
Regardless, is sounds like you already have an acceptable solution.
02-24-2012 07:26 AM
Another option is to use a regular expression. The PERL function PRXCHANGE will do the job here, the code below will remove all non-word characters and the underscore. The syntax takes a bit of getting used to, however there are plenty of online documents to help, including this useful tip sheet. http://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf
length old new $20;
new=prxchange('s/[\W_]//o', -1, old);