I have a character variable that contains alphabets, numbers, some symbols, and both ' and " and also some strange characters (characters that are not on any keyboard). I want to create a new variable with only the characters I want. Because I am using " before and after my list, I cannot write " to keep it in my observations for the variable var. How may I include "?
Thanks!
data want;
set have;
new_var=compress(var,"ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890$%.-':|&[]{}*!@#()!/+,");
run;
To include the character used to quote a string literal just double it up in the string content. Example:
word = 'Don''t';
So if you are using " on the outside to include it as one of the characters just add "" to the list.
compress(var,"ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890$%.-""':|&[]{}*!@#()!/+,")
Hi @Emma_at_SAS
It might be easier to use the "k" modifier. It reverses the compress, so only the specifies characters are kept indtead of dropped.
Drop all printable and non-printable chars except "ABC":
compress(str,'ABC','k');
Drop all printable and non-printable chars except digits, english chars and underscore. There are no characters specified, but the mentioned chars are added with the 'n' modifier. More examples in the online documentation:
compress(str,,'nk');
Thank you, @Tom and @ErikLund_Jensen. A combination of your suggestions solved my problem
I used the double "" and,"k" modifier and that gives me exactly what I want.
Thanks!
May someone please help me to mark this conversation as solved? I am not sure because it was solved based on two suggestions.
Thanks
How about just-
compress(var,,'p');
Hi @Emma_at_SAS,
From the documentation of the scan() function, when the modifier argument is set to "p", as in @novinosrin's post, punctuation is added to the list of characters that are to be removed from the text:
p or P adds punctuation marks to the list of characters.
Kind regards,
Amir.
data test;
input var$ ;
datalines;
oÔÇÖ\[}s
dsd_'"sf
;
run;
| Obs | var |
|---|---|
| 1 | oÔÇÖ\[}s |
| 2 | dsd_'"sf |
data want;
set test;
new_var=compress(var,"ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890$%.-'"":|&[]{}*!@#()!/+, ",'kg');
run;
proc print data=want; run;
| The SAS System |
| Obs | var | new_var |
|---|---|---|
| 1 | oÔÇÖ\[}s | oÔÇÖ\[}s |
| 2 | dsd_'"sf | dsd_'"sf |
Thank you @Ksharp for your suggestion. I already have a solution but I thought this might be easier to just remove the "graphic characters". Beacuse I want to keep everything except the translation of emojis that appear in my SAS dataset. May you please help me with an example? Thanks
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.