- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a character variable that contains alphabets, numbers, some symbols, and both ' and " and also some strange characters (characters that are not on any keyboard). I want to create a new variable with only the characters I want. Because I am using " before and after my list, I cannot write " to keep it in my observations for the variable var. How may I include "?
Thanks!
data want;
set have;
new_var=compress(var,"ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890$%.-':|&[]{}*!@#()!/+,");
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To include the character used to quote a string literal just double it up in the string content. Example:
word = 'Don''t';
So if you are using " on the outside to include it as one of the characters just add "" to the list.
compress(var,"ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890$%.-""':|&[]{}*!@#()!/+,")
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Emma_at_SAS
It might be easier to use the "k" modifier. It reverses the compress, so only the specifies characters are kept indtead of dropped.
Drop all printable and non-printable chars except "ABC":
compress(str,'ABC','k');
Drop all printable and non-printable chars except digits, english chars and underscore. There are no characters specified, but the mentioned chars are added with the 'n' modifier. More examples in the online documentation:
compress(str,,'nk');
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, @Tom and @ErikLund_Jensen. A combination of your suggestions solved my problem
I used the double "" and,"k" modifier and that gives me exactly what I want.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
May someone please help me to mark this conversation as solved? I am not sure because it was solved based on two suggestions.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How about just-
compress(var,,'p');
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Emma_at_SAS,
From the documentation of the scan() function, when the modifier argument is set to "p", as in @novinosrin's post, punctuation is added to the list of characters that are to be removed from the text:
p or P adds punctuation marks to the list of characters.
Kind regards,
Amir.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data test;
input var$ ;
datalines;
oÔÇÖ\[}s
dsd_'"sf
;
run;
Obs | var |
---|---|
1 | oÔÇÖ\[}s |
2 | dsd_'"sf |
data want;
set test;
new_var=compress(var,"ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890$%.-'"":|&[]{}*!@#()!/+, ",'kg');
run;
proc print data=want; run;
The SAS System |
Obs | var | new_var |
---|---|---|
1 | oÔÇÖ\[}s | oÔÇÖ\[}s |
2 | dsd_'"sf | dsd_'"sf |
Thank you @Ksharp for your suggestion. I already have a solution but I thought this might be easier to just remove the "graphic characters". Beacuse I want to keep everything except the translation of emojis that appear in my SAS dataset. May you please help me with an example? Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
new_var=compress(var,"ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890$%.-'"":|&[]{}*!@#()!/+, ", 'k'
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
new_var=compress(var,"ABC " ,'g');