- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi i am having sepecial characters like this ╥Bç_┐▌lær“Y╓╤╙╥I how can i compress them ,i am may get these kind of special characters in the future i can use the compress function and do it is there any way to compress all this kind of special characters like this for the unknown special characters like this. New=compress(old,'╥Bç_┐▌lær“Y╓╤╙╥I');
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
compress has the third argument 'k' which can keep all the character you want.
For example: I only want keep character B
data want; set sashelp.class; want=compress(name,'B','k'); keep want; run;
Ksharp
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
compress(name,,'kad')
k - keep
a - letters
d - digits
this is equivalent to Keith's regular expression (s/[\W_]//o).
\W finds the inverse of \w (letters, digits, and underscores) and then he choose to then also exclude underscores.
If you do not want to remove underscores use 'kn' instead of a and d or change the regular expression to s/\W//o
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Although, if you are going to use the k(eep) operator, you may also want to include s (for spaces and tabs) and p for punctuation marks (like commas and periods).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @FriedEegg,
I did think of that Compress solution first of all (as it appears to be the simplest), however I discovered it doesn't strip out the foreign letters (e.g. ç or æ), whereas Prxchange does. I guess there are subtle differences in the underlying code for these functions.
So it looks like it depends on whether these letters are wanted or not as to which solution is the best to use.
Regards,
Keith
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Compress can be limited to just English characters. Take a look at: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@art297
Thanks for that, sadly I'm still using 9.1.
If I put compress(old,,"kad") it keeps the foreign letters, yet if I put compress(old,,"kfd") it removes them! The only difference between the 2 should be that "f" includes the underscore. Looks like a bug in 9.1
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is it at least 9.1.3? The modifiers weren't even added until then. However, if you are on 9.1.3, then you could just specify exactly which characters you want to keep in the second field (which is currently only ,, ).
Regardless, is sounds like you already have an acceptable solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Another option is to use a regular expression. The PERL function PRXCHANGE will do the job here, the code below will remove all non-word characters and the underscore. The syntax takes a bit of getting used to, however there are plenty of online documents to help, including this useful tip sheet. http://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf
data test;
length old new $20;
old="-Bç_+¦lær“Y+-+-I ";
new=prxchange('s/[\W_]//o', -1, old);
run;